透過您的圖書館登入
IP:216.73.216.60
  • 學位論文

在下棋與訓練階段改進AlphaZero演算法

Improving the AlphaZero Algorithm in the Playing and Training Phases

指導教授 : 林順喜
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


none

關鍵字

none

並列摘要


AlphaZero got grand success across many challenging games, but it needs a huge computational power to train a good model. Instead of investing so many resources, we focus on improving the performance of AlphaZero. In this work, we introduce seven major enhancements in AlphaZero. First, the AlphaZero-miniMax Hybrids strategy combines the modern AlphaZero approach and traditional search algorithm to improve the strength of the program. Second, the Proven-mark strategy prunes unneeded moves to avoid the re-sampling problem and increase the opportunity of exploring the promising moves. Third, the Quick-win strategy distinguishes the rewards according to the length of the game-tree search, and no longer treats all wins (or losses) equally. Fourth, the Best-win strategy resolves an inaccurate win rate problem by updating the best reward rather than average. Fifth, the Threat-space-reduction improves the performance of the neural network training under limited resources. Sixth, the Big-win strategy takes into consideration the number of points of the final outcome instead of simply labeling win/loss/draw. Finally, the Multistage-training strategy improves the quality of the neural network for multistage games. After years of work, we derive some promising results that have already improved the performance of the AlphaZero algorithm on some test domains.

參考文獻


[1]J. McCarthy, “Chess as the Drosophila of AI,” Computers, Chess, and Cognition, pp. 227-237, 1990. DOI: 10.1007/978-1-4613-9080-0_14.
[2]D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the Game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, 2016. DOI: 10.1038/nature16961.
[3]D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354-359, 2017. DOI: 10.1038/nature24270.
[4]D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play,” Science, vol. 362, no. 6419, pp. 1140-1144, 2018. DOI: 10.1126/science.aar6404.
[5]D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” 2018. Retrieved from https://arxiv.org/abs/1712.01815.

延伸閱讀