以增強式學習策略解珠璣妙算問題

珠璣妙算是一個稱為「公牛與母牛」的傳統puzzle遊戲的現代版本，其起源可以追溯到中世紀。1970年代Knuth的論文開啟了珠璣妙算的研究，至今超過30年以上。這個問題一直受到作業研究、機器學習等相關研究人員所重視，近年來也有許多新的研究成果。本論文以增強式學習的策略進行珠璣妙算遊戲，除了學習密碼破解的最佳策略外，也學習密碼建構的最佳策略。我們使用增強式學習中的蒙地卡羅法及Sarsa()評估動作價值函數，並更進一步使用線性函數做為動作價值函數的近似。由實驗的結果可知，我們目前的平均猜測次數是4.294次優於Most Part的4.373，在密碼建構的研究方面，我們成功增加了其它方法的平均猜測次數，使Most Part的平均猜測次數由 4.373 增為 4.635。

關鍵字

珠璣妙算；機器學習；增強式學習；蒙地卡羅法

並列摘要

Mastermind is a modern version of a traditional puzzle called bulls and cows that traces back to the Middle Ages. The art of solving the Mastermind puzzle was initiated by Donald Knuth and is already more than thirty years old; despite that, it still receives much attention in operational research and machine learning communities, and still has several new research results recently. In this thesis, we play the Mastermind puzzle using strategies of reinforcement learning. In addition to learning the optimal policy to guess secret code, we also learn the optimal policy to construct secret code. We use the Monte-Carlo method and Sarsa() to estimate the state-action value function, and use linear function to approximate the action-value function further. The experimental results show that the expected number of guessing of our approach is 4.294, which is better than most part’s 4.373. In secret code construction, we increase numbers of guessing of other methods successfully, such as the expected number of guessing of most part increasing to 4.635 from 4.373.

並列關鍵字

Mastermind ； Machine Learning ； Reinforcement Learning ； Monte Carlo methods ； Sarsa

參考文獻

[4]Merelo-Guerv´os, J.J., Castillo, P., Rivas, V. Finding a needle in a haystack using hints and evolutionary computation: the case of evolutionary MasterMind. Applied Soft Computing 6(2), 170–179, (2006).

[5]Neuwirth, E. Some strategies for mastermind. Zeitschrift fur Operations Research. Serie B 26(8), B257–B278 (1982).

[6]Richard, S., Sutton, Andrew G., Barto. Reinforcement Learning: An Introduction, The MIT Press, Cambridge, (2002).

[7]Runarsson, T.P., Merelo, J.J. Adapting heuristic Mastermind strategies to evolutionary algorithms. In: NICSO 2010 Proceedings. LNCS. Springer, Heidelberg (2010).

[1]Irving, R.W.: Towards an optimum mastermind strategy. Journal of Recreational Mathematics 11(2), 81–87, (1978-1979)

Google Scholar

國際替代計量

以增強式學習策略解珠璣妙算問題

未授權

主題瀏覽