在下棋與訓練階段改進AlphaZero演算法

陳志宏

doi:10.6345/NTNU202101339

透過您的圖書館登入 IP:216.73.216.60

透過您的圖書館登入

IP:216.73.216.60

繁體中文
English
简体中文

精確檢索 : 冠狀病毒
模糊檢索 : 冠狀病毒
冠狀病毒感染

冠狀病毒疾病
查詢出版品: 冠狀病毒

進階查詢

查詢歷史

主題瀏覽

【下載完整報告】國民法官、工作與心理健康成熱門研究議題？熱門研究焦點一次看！

學位論文

在下棋與訓練階段改進AlphaZero演算法

Improving the AlphaZero Algorithm in the Playing and Training Phases

陳志宏(Chen Chih-Hung)

指導教授：林順喜

國立臺灣師範大學/理學院/資訊工程學系/博士(2021年)

https://doi.org/10.6345/NTNU202101339

若您是本文的作者，可授權文章由華藝線上圖書館中協助推廣。

查找全文

摘要

none

關鍵字

none

並列摘要

AlphaZero got grand success across many challenging games, but it needs a huge computational power to train a good model. Instead of investing so many resources, we focus on improving the performance of AlphaZero. In this work, we introduce seven major enhancements in AlphaZero. First, the AlphaZero-miniMax Hybrids strategy combines the modern AlphaZero approach and traditional search algorithm to improve the strength of the program. Second, the Proven-mark strategy prunes unneeded moves to avoid the re-sampling problem and increase the opportunity of exploring the promising moves. Third, the Quick-win strategy distinguishes the rewards according to the length of the game-tree search, and no longer treats all wins (or losses) equally. Fourth, the Best-win strategy resolves an inaccurate win rate problem by updating the best reward rather than average. Fifth, the Threat-space-reduction improves the performance of the neural network training under limited resources. Sixth, the Big-win strategy takes into consideration the number of points of the final outcome instead of simply labeling win/loss/draw. Finally, the Multistage-training strategy improves the quality of the neural network for multistage games. After years of work, we derive some promising results that have already improved the performance of the AlphaZero algorithm on some test domains.

並列關鍵字

AlphaZero-miniMax Hybrids ； Proven-mark strategy ； Quick-win strategy ； Best-win strategy ； Threat-space-reduction ； Big-win strategy ； Multistage-training strategy

參考文獻

[1]J. McCarthy, “Chess as the Drosophila of AI,” Computers, Chess, and Cognition, pp. 227-237, 1990. DOI: 10.1007/978-1-4613-9080-0_14.

Google Scholar

[2]D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the Game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, 2016. DOI: 10.1038/nature16961.

Google Scholar

[3]D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354-359, 2017. DOI: 10.1038/nature24270.

Google Scholar

[4]D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play,” Science, vol. 362, no. 6419, pp. 1140-1144, 2018. DOI: 10.1126/science.aar6404.

Google Scholar

[5]D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” 2018. Retrieved from https://arxiv.org/abs/1712.01815.

Google Scholar

國際替代計量

在下棋與訓練階段改進AlphaZero演算法

主題瀏覽

在下棋與訓練階段改進AlphaZero演算法

Improving the AlphaZero Algorithm in the Playing and Training Phases

摘要

關鍵字

並列摘要

並列關鍵字

參考文獻

延伸閱讀

國際替代計量

相關連結

本網站使用Cookies