透過您的圖書館登入
IP:3.23.127.197
  • 學位論文

應用於電腦圍棋之蒙地卡羅樹搜尋法的新啟發式演算法

New Heuristics for Monte Carlo Tree Search Applied to the Game of Go

指導教授 : 林順喜 Rémi Coulom
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


電腦圍棋的研究開始於1970年,但圍棋程式卻從未曾被人們認為是強大的,直到2006年,當「蒙地卡羅樹搜尋」(Monte Carlo Tree Search)與「樹狀結構信賴上界法」(Upper Confidence bounds applied to Trees)出現之後,情況才開始完全不同。「蒙地卡羅樹搜尋」與「樹狀結構信賴上界法」所帶進的革命強而有力到一個地步,人們甚至開始相信,圍棋程式在10年或者20年之後,將能夠擊敗頂尖的人類棋手。 在本研究中,我們針對「蒙地卡羅樹搜尋」提出一些新的啟發式演算法,主要有兩方面的貢獻。第一個貢獻,是成功的將「模擬平衡化」(Simulation Balancing)應用到9路圍棋。「模擬平衡化」是一種用來訓練模擬的參數的演算法。Silver與Tesauro在2009年提出這個方法時,只實驗在比較小的盤面上,而我們的實驗結果首先證明了「模擬平衡化」在9路圍棋的有效性,具體方法是證明「模擬平衡化」超越了知名的監督式演算法Minorization-Maximization (MM)大約有90 Elo之多。第二個貢獻是針對19路圍棋,系統式的實驗了各種不同之時間控制的方法。實驗結果清楚的指明,聰明的時間控制方案可以大大的提高棋力。所有的實驗都是執行在我們的圍棋程式ERICA,而ERICA正是得益於這些啟發式演算法與實驗結果,成功取得了2010年電腦奧林匹亞的19路圍棋金牌。

並列摘要


Research into computer Go started around 1970, but the Go-playing programs were never, in a real sense, considered to be strong until the year 2006, when the brand new search scheme Monte Carlo Tree Search (MCTS) and Upper Confidence bounds applied to Trees (UCT) appeared on the scene. The revolution of MCTS and UCT promoted progress of computer Go to such a degree that people began to believe that after ten or twenty years, Go-playing programs will be able to defeat the top human players. In this research, we propose some new heuristics of MCTS focused on two contributions. The first contribution is the successful application of Simulation Balancing (SB), an algorithm for training the parameters of the simulation, to 9×9 Go. SB was proposed by Silver and Tesauro in 2009, but it was only practiced on small board sizes. Our experiments are the first to demonstrate its effectiveness in 9×9 Go by showing that SB surpasses the well-known supervised learning algorithm Minorization-Maximization (MM) by about 90 Elo. The second contribution is systematic experiments of various time management schemes for 19×19 Go. The results indicate that clever time management algorithms can considerably improve playing strength. All the experiments were performed on our Go-playing program ERICA, which benefitted from these heuristics and the experimental results to win the gold medal in the 19×19 Go tournament at the 2010 Computer Olympiad.

參考文獻


Abramson, B. (1990). Expected-Outcome: A General Model of Static Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 2, pp. 182-193.
Audouard, P., Chaslot, G., Hoock, J.-B., Rimmel, A., Perez, J., and Teytaud, O. (2009). Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go. In EvoGames, Tuebingen Allemagne. Springer.
Baier, H. and Drake, P. (2010). The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go. IEEE Transactions on Computational Intelligence and AI in Games, Vol. 2, pp. 303-309.
Baum, E. B. and Smith, W. D. (1997). A Bayesian Approach to Relevance in Game Playing. Artificial Intelligence, Vol. 97, No.1–2, pp. 195-242.
Bouzy, B. (2003). Associating domain-dependent knowledge and Monte Carlo approaches within a Go program. Information Sciences, Vol. 175, pp. 247-257.

被引用紀錄


賴柔羽(2012)。棋形導向的蒙地卡羅電腦圍棋程式之設計與製作〔碩士論文,長榮大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0015-0302201222581500

延伸閱讀