六貫棋遊戲實作與強化學習應用

六貫棋，又被稱作納許棋，是一個雙人對局遊戲，最初是在1942年12月26日在丹麥報紙Politiken發表的一篇文章中出現，被稱作Polygon。於1948年時，由數學家約翰·福布斯·納許重新獨立發明出來，在最初被稱作納許棋(Nash)。後來於1952年遊戲玩具製造商Parker Brothers將其作為遊戲發行，將此遊戲命名為Hex。在六貫棋的棋盤上由雙方輪流落子，雙方各擁有一組對邊，藉由佔領格子的方式將自己方的兩條邊連接起來以獲得勝利。在六貫棋當中已被約翰·福布斯·納許使用策略偷取的方式證明出六貫棋在先手方擁有必勝策略，而在路數小於8的盤面已經被完全破解出所有的必勝策略。本研究試圖利用AlphaZero論文當中所提到的訓練方式，利用蒙地卡羅樹搜尋演算法搭配類神經網路訓練，嘗試藉由強化學習，從零人類知識開始只提供遊戲規則的方式，並針對3至4路的六貫棋棋盤，來訓練出能夠自我學習出完全破解3至4路的六貫棋的程式。依循此模式，在計算資源更為豐沛時，未來可以往更高路數的六貫棋實驗其破解的可能性。

關鍵字

六貫棋；強化學習；深度學習

並列摘要

Hex, also called Nash, is a game with two players. At first, it appeared and was called Polygon on a Denmark newspaper Politiken in 1942. In 1948, John Nash, who was a Mathematician, invented it and called it Nash. In 1952, Parker Brothers, which was a toy manufacturer, published it and called it Hex. On the board of Hex, two players take turns placing a stone of their color on a single cell within the overall playing board. The goal for each player is to form a connected path of their own stones linking the opposing sides of the board marked by their colors, before their opponent connects his or her sides in a similar fashion. The first player to complete his or her connection wins the game. Hex has been proved by John Nash by the strategy stealing argument so that the first player has a winning policy, and the boards with a size smaller than 8 have been solved by the program. In this research, we try to use the AlphaZero training method, which uses Monte Carlo Tree Search Algorithm with Deep Learning, and try to use Reinforcement Learning to train a model without human knowledge to solve Hex with a board size of 3 and 4. According to this approach, we hope that the boards with larger sizes can also be solved using more computation resources in the future.

並列關鍵字

Hex ； AlphaZero ； Reinforcement Learning ； Deep Learning

參考文獻

[1]. suragnair/alpha-zero-general, https://github.com/suragnair/alpha-zero-general。

Google Scholar

[2]. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepe & Demis Hassabis, “Mastering the game of Go without human knowledge”,Nature volume 550, pages 354–359 (19 October 2017).