In recent years, many reinforcement learning (RL) methods have been suggested and applied to various problems, where agents acquire their own policies to maximize the total amount of rewards. We have considered that the agents' policies are improved effectively by supervised learning mechanism using the stored data for the agent's behavior and rewards, and then have proposed a system for improving an RL agent's policy with a mixture model of Bayesian Networks. This paper employs two types of mixture models, and introduces a new technique so that agents can adapt to dynamic environments. We investigated the adaptability of our system to environmental changes and compared the properties of the new technique with the previous one.