透過您的圖書館登入
IP:18.191.216.163
  • 學位論文

以增強式學習策略解珠璣妙算問題

Finding Efficient Solutions for Mastermind Puzzle using Reinforcement Learning

指導教授 : 呂威甫
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


珠璣妙算是一個稱為「公牛與母牛」的傳統puzzle遊戲的現代版本,其起源可以追溯到中世紀。1970年代Knuth的論文開啟了珠璣妙算的研究,至今超過30年以上。這個問題一直受到作業研究、機器學習等相關研究人員所重視,近年來也有許多新的研究成果。本論文以增強式學習的策略進行珠璣妙算遊戲,除了學習密碼破解的最佳策略外,也學習密碼建構的最佳策略。我們使用增強式學習中的蒙地卡羅法及Sarsa()評估動作價值函數,並更進一步使用線性函數做為動作價值函數的近似。由實驗的結果可知,我們目前的平均猜測次數是4.294次優於Most Part的4.373,在密碼建構的研究方面,我們成功增加了其它方法的平均猜測次數,使Most Part的平均猜測次數由 4.373 增為 4.635。

並列摘要


Mastermind is a modern version of a traditional puzzle called bulls and cows that traces back to the Middle Ages. The art of solving the Mastermind puzzle was initiated by Donald Knuth and is already more than thirty years old; despite that, it still receives much attention in operational research and machine learning communities, and still has several new research results recently. In this thesis, we play the Mastermind puzzle using strategies of reinforcement learning. In addition to learning the optimal policy to guess secret code, we also learn the optimal policy to construct secret code. We use the Monte-Carlo method and Sarsa() to estimate the state-action value function, and use linear function to approximate the action-value function further. The experimental results show that the expected number of guessing of our approach is 4.294, which is better than most part’s 4.373. In secret code construction, we increase numbers of guessing of other methods successfully, such as the expected number of guessing of most part increasing to 4.635 from 4.373.

參考文獻


[4]Merelo-Guerv´os, J.J., Castillo, P., Rivas, V. Finding a needle in a haystack using hints and evolutionary computation: the case of evolutionary MasterMind. Applied Soft Computing 6(2), 170–179, (2006).
[5]Neuwirth, E. Some strategies for mastermind. Zeitschrift fur Operations Research. Serie B 26(8), B257–B278 (1982).
[6]Richard, S., Sutton, Andrew G., Barto. Reinforcement Learning: An Introduction, The MIT Press, Cambridge, (2002).
[7]Runarsson, T.P., Merelo, J.J. Adapting heuristic Mastermind strategies to evolutionary algorithms. In: NICSO 2010 Proceedings. LNCS. Springer, Heidelberg (2010).
[1]Irving, R.W.: Towards an optimum mastermind strategy. Journal of Recreational Mathematics 11(2), 81–87, (1978-1979)

延伸閱讀