結合Q-Learning與混合學習方法於足球代理人系統

機器人世界盃(RoboCup)是從1997年開始所發展的一個國際競賽，RoboCup的展望是：到2050年，培養一支機器人足球隊可以按照國際足球協會(FIFA)的規則與人類世界的世界盃冠軍隊進行一場人機大賽，並戰而勝之。在學術上，它則為機器學習提供了一個最佳的測試平台。　　由於足球比賽中，球場上的環境狀態不停地在改變，所以，如何讓足球代理人自主學習以應對作出最佳的回應，是一個重要的議題。本篇研究在教練代理人方面，延用了混合學習方法，而在球員代理人方面，則加入強效式學習(Reinforcement Learning)中的Q-Learning學習方法來應用於球員代理人的學習，目的在於讓教練與球員同時具有學習能力，以提昇球隊的整體戰力。　　而為解決球場上環境狀態數量過大，致影響學習速度過慢問題，本研究應用模糊理論及模糊規則推論，減少狀態數量及狀態-動作表(State-Action Table)的複雜度。在實作上，採取第一次執行才將Q值寫入狀態-動作表的方式，以大量降低狀態-動作表的資料量，減少系統資源負荷，提高執行效率。　　最後，我們在RoboCup 模擬平台(RoboCup Soccer Simulation Platform) 建立此足球隊，以實驗在球員與教練同時具有學習能力的情況下，其學習效果與球隊的執行效率為何。

關鍵字

Q-Learning ； Fuzzy ；機器學習；多代理人系統； RoboCup

並列摘要

RoboCup is an international competition developed in 1997. The mission is “By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the official rule of the FIFA, against the winner of the most recent World Cup”. For academic, RoboCup provides an excellent test bed for machine learning. As in a soccer game, environment states are constantly changing. Therefore, how to make a soccer agent learn autonomously to act with the best responses has becomes an important issue. The paper “Applying Hybrid Learning Approach to RoboCup's Strategy” discusses the hybrid learning approach in this field. In this paper, to carry on the concept, we continue to apply the hybrid learning approach for the coach agent; while for the player agent, we apply the Q-Learning method. Furthermore, in order to solve the excessive environment state which slows down the learning rate, here we use fuzzy-state and fuzzy-rule to decrease the state space and to simplify the State-Action Table of Q-Learning. Finally, we build this soccer team that coach agent and player agent both have learning ability in RoboCup Soccer simulator. Through experiments, we analyze and compare the learning effects and the efficiency of execution.

並列關鍵字

Q-Learning ； Fuzzy ； Machine learning ； Multi-agent System ； RoboCup

參考文獻

[33] Y. H. Lin, “The Construction of Fuzzy Linguistic N umbers for Questionnaire and Its Empirical Study,” Survey Research - Method and Application, vol. 11, pp. 31-71, 2002.

[3] J. Y. Kuo, F. C. Huang, S. P. Ma, and Y. Y. Fanjiang, “Applying Hybrid Learning Approach to RoboCup's Strategy,” Journal of Systems & Software, 2013.

[5] L. Waltman, U. Kaymak, “A Theoretical Analysis of Cooperative Behavior in Multi-agent Q-learning,” IEEE International Symposium on Approximate Dynamic Program- ming and Reinforcement Learning, pp. 84 - 91, 2007.

[6] L. A. Zadeh, “Fuzzy sets,” Information and Control 8, pp. 338 - 353, 1965.

[7] M. E. Bratman, “Intention, Plans, and Practical Reason,” Harvard University Press, 1987.

國際替代計量

結合Q-Learning與混合學習方法於足球代理人系統

全文下載

主題瀏覽