傳統的加強式學習演算法,如 Q-learning,是建立在一個代理人以及沒有模型的狀況下以one step 的方式學習,所以近幾年來紛紛有許多人提出多代理人以及利用模型做重複學習的概念用來解決學習效率低落的問題,如Dyna-Q、multiagent system 等等。 在這篇論文中,我們融合了一些在不同領域上的演算法,應用不同領域的概念在加強式學習上並且配合Dyna-Q 以及multi agent system 等既有的概念再做延伸。 在代理人的探索上加入了UCB 的算法,加強代理人在探索上的效率,縮短建立虛擬環境模型的時間。在Dyna-Q 的虛擬環境模型上,加入了影像處理的概念銳化模型。 我們也提出了一個能夠實行平行運算,加快Dyna-Q 學習的一種針對環境空間平行的規劃算法,並將優先掃除的概念融入其中,進一步的增加規劃的效率,有效的利用運算的資源, 基於以上的演算法延伸以及融合後, 利用GPGPU(General Purpose Computing on Graphics Processing Units)的概念將模擬實作在CUDA(Compute Unified Device Architecture)架構上,並藉由模擬的方式驗證以上所提出的方法對Dyna-Q 的學習速度上的影響。
Traditional reinforcement learning algorithm, such as Q-learning, is based on one agent and one step learning without a model. In recent years, many have proposed the concepts of multi-agents and using a model for retraining to increase learning efficiency, such as Dyna-Q and multi-agent system. In this thesis, we integrated several algorithms of different domains, applied concepts from different domains in reinforcement learning, and made extensions in compliance with existing concepts such as Dyna-Q and multi-agent system. We added UCB algorithm to reinforce exploration efficiency of agents and shorten the time for virtual environment model establishment. For the virtual environment model of Dyna-Q, we added the concept of image processing to sharpened model. We also proposed a planning algorithm for environmental space paralleling, which can perform parallel computing and accelerate Dyna-Q learning. The concept of prioritized sweeping was integrated to further increase planning efficiency and resource management. After improving and integrating the above algorithms, the concept of GPGPU (General Purpose Computing on Graphics Processing Units) was used for simulation on CUDA (Compute Unified Device Architecture). The simulation was applied for verifying the impact of the above method on learning speed of Dyna-Q.