基於AlphaZero General Framework實現Breakthrough遊戲

在現今人工智慧電腦對局領域中，多數棋類的頂尖程式，都以AlphaZero的開發框架獨占鰲頭，棋力遠超以往傳統的程式，然而此種架構中有許多研發內容並不因不同棋類的規則而有所不同，當需要研發新種類的對局程式時將會有許多重複的前置開發成本。故本論文中以C++實作遊戲規則及搜尋樹處理，以Python與TensorFlow套件實作類神經網絡訓練，兩者結合出易讀且運行效率較高的通用型AlphaZero框架的程式，此框架能夠讓使用者只需更改遊戲規則，即可開始AlphaZero的訓練模式。相較於GitHub相關開源碼中，Surag Nair先生全部以Python語言開發的alpha-zero-general程式，在突圍棋(Breakthrough)運行上，單執行緒速度效能可提升77.8%。此外，本論文另外實作並測試三個可能的改良方法，用於提升整體AlphaZero訓練流程的棋力。其修改點並不因不同棋類規則而有所不同，目的在於讓後續能套用至通用型AlphaZero框架的棋類也能夠受益。分別是對訓練資料進行增量的Replay方法、應用MMoE(Multi-Gate Mixture-of-Experts)類神經網路架構於AlphaZero中欲增強網路模型的預測能力，以及利用改良原版AlphaZero中如何贏得越快越好的Quick Win方法，將針對類神經網路的Label更改標記方式與蒙地卡羅樹搜尋演算法進行改良。

關鍵字

電腦對局； AlphaZero ；突圍棋；類神經網路；深度學習

並列摘要

In the field of artificial intelligence, many programs for computer games using AlphaZero approach outperform the other programs using traditional technics. However, we will have the similar and repeated development cost when starting from scratch to implement different game programs using AlphaZero framework. Our work is to implement an efficient and easy to use AlphaZero framework with C++ and Python programming languages. Users can start the whole AlphaZero training process immediately by only modifying the game module. Compared with the alpha-zero-general program written by Surag Nair in GitHub, we achieve 77.8% speedup in Breakthrough game. Further, we implement and test three possible improvements for AlphaZero approach. That includes the Replay method for augmented training data, the MMoE(Multi-Gate Mixture-of-Experts) method for enhancing neural network model, and the Quick Win method for learning how to win faster.

並列關鍵字

Computer games ； AlphaZero ； Breakthrough ； Neural network ； Deep learning

參考文獻

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ..., and Kudlur, M. (2016). "Tensorflow: A System for Large-Scale Machine Learning", Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, pp. 265-283.

Google Scholar

Bansal, Trapit, David Belanger, and Andrew McCallum (2016). "Ask the Gru: Multi-task Learning for Deep Text Recommendations", Proceedings of the 10th ACM Conference on Recommender Systems.

Google Scholar

Broderick Arneson, Ryan Hayward, and Philip Henderson (2009). "MoHex Wins Hex Tournament", ICGA Journal, 32 (2): 114–116.

Google Scholar

Chang-Shing Lee, Mei-Hui Wang, Guillaume Chaslot, Jean-Baptiste Hoock, Arpad Rimmel, Olivier Teytaud, Shang-Rong Tsai, Shun-Chin Hsu, and Tzung-Pei Hong (2009). "The Computational Intelligence of MoGo Revealed in Taiwan’s Computer Go Tournaments", IEEE Transactions on Computational Intelligence and AI in Games, 1 (1): 73–89.

Google Scholar

Chaslot, G. M. J., Winands, M. H., HERIK, H. J. V. D., Uiterwijk, J. W., and Bouzy, B. (2008). "Progressive Strategies for Monte-Carlo Ttree Search", New Mathematics and Natural Computation, 4(03), 343-357.

Google Scholar

國際替代計量

基於AlphaZero General Framework實現Breakthrough遊戲

全文下載

主題瀏覽