以AlphaGo Zero演算法為基礎的Net2Net網路擴增方法

在遵從 AlphaGo Zero 演算法的電腦圍棋程式中，殘差網路(residual network)區塊的數量通常影響了整個程式的強度。這篇論文提出了一個方法來加深殘差網路，同時也不降低其成效。再來，因為自我對弈通常是 AlphaGo Zero 演算法中最耗時的部分，所以我們展示如何使用原先網路所產生的自我對弈棋譜來接續地訓練加深後的網路，以節省訓練時間。加深的過程是以 Net2Net 為基礎概念將新的層插入在原先的網路上，並且我們介紹了 3 種插入的機制。最後，在多種採樣先前產生之自我對弈棋譜的方法中，我們提出了兩個方法使加深的網路可以接續的訓練。而在我們 $9 imes 9$ 的圍棋實驗中，將原先的 20 個殘差區塊擴增到 40 個殘差區塊，結果顯示，表現最好的擴增機制對於未擴增的玩家（20個區塊）可以達到 61.69\% 的勝率，且大量節省了自我對弈的時間。

關鍵字

機器學習；網路擴增

並列摘要

The number of residual network blocks in a computer Go program following the AlphaGo Zero algorithm is one of the key factors to the program's playing strength. In this paper, we propose a method to deepen the residual network without reducing performance. Next, as self-play tends to be the most time-consuming part of AlphaGo Zero training, we demonstrate how it is possible to continue training on this deepened residual network using the self-play records generated by the original network (for time saving). The deepening process is performed by inserting new layers into the original network. We present in this paper three insertion schemes based on the concept behind Net2Net. Lastly, of the many different ways to sample the previously generated self-play records, we propose two methods so that the deepened network can continue the training process. In our experiment on the extension from 20 residual blocks to 40 residual blocks for $9 imes 9$ Go, the results show that the best performing extension scheme is able to obtain 61.69\% win rate against the unextended player (20 blocks) while greatly saving the time for self-play.

並列關鍵字

machine learning ； network extension ； AlphaGo Zero ； transfer learning

參考文獻

[1]D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert,L. Baker, M. Lai, A. Boltonet al., “Mastering the game of Go without human knowledge,”Nature, vol. 550, no. 7676, p. 354, 2017.

Google Scholar

[2]Y. Tian, J. Ma, Q. Gong, S. Sengupta, Z. Chen, J. Pinkerton, and C. L. Zitnick,“ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero,”arXiv preprint arXiv:1902.04522, 2019.

Google Scholar

[3]G.-C. Pascutto. (2018) Leela-Zero Github repository. [Online]. Available:https://github.com/gcp/leela-zero

Google Scholar

[4]I.-C. Wu, T.-R. Wu, A.-J. Liu, H. Guei, and T. Wei, “On Strength Adjustment for MCTS-Based Programs,” in Thirty-Third AAAI Conference on Artificial Intelligence, 2019.

Google Scholar

[5]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.770–778.

Google Scholar

國際替代計量

以AlphaGo Zero演算法為基礎的Net2Net網路擴增方法

全文下載

主題瀏覽