使用強化學習技術於星海爭霸遊戲策略之研究

近期，強化學習已成為人工智慧技術發展的重要趨勢之一。本研究主要基於 Rainbow 強化學習演算法並應用於星海爭霸遊戲策略，藉由 Rainbow 演算法之融合架構手法解決並改進單一結構的模型效能。本研究主要混合使用深度學習演算法和強化學習演算法，包含:Q-Learning、 DQN、Double-DQN、Dueling-DQN、Prioritized Experience Replay、Noisy-Net、 Distributional、Multi-Step 等演算法，並將以上演算法集之大成為 Rainbow 演算法，於兩項星海爭霸之迷你地圖任務進行訓練且進行策略研究及效能改進。研究結果指出，使用 Rainbow 演算法的結構相較於 DQN 演算法的結構提升約 25%的效能；除此之外，利用已知的策略能使模型能有效地完成任務，於訓練前期階段強化探索能力，中、後期階段利用記憶經驗回放不斷地修正模型。總結而言，本研究提出的 Rainbow 強化學習演算法能有效改善單一深度強化學習神經網路結構的學習效能，於兩項星海爭霸訓練任務中，可以在更短的訓練週期內完成收斂並突破訓練任務的得分瓶頸。

關鍵字

強化學習；深度學習；戰略遊戲；人工智慧

並列摘要

Recently, reinforcement learning has become one of the important trends in the development of artificial intelligence.This study is mainly based on the Rainbow reinforcement learning algorithm and applied to the StarCraft game strategy, also solves and improves the model performance of a single structure through the fusion architecture of the Rainbow algorithm. This study mainly uses a mixture of deep learning algorithms and reinforcement learning algorithms, including: Q-Learning, DQN, Double DQN, Dueling-DQN, Prioritized Experience Replay, Noisy-Net, Distributional, and Multi-Step. Rainbow algorithm is a comprehensive expression of above, also training on two StarCraft mini-map missions and conduct strategy research and performance improvements.The experimental results of this study indicate that the architecture using Rainbow algorithm improves the performance about 25% compared to the architecture of DQN algorithm. In addition, the method by using the known strategies can enable the model to effectively finish the training task, strengthen the exploration ability during the early stage of model training, and continuously modify the model by using memory experience replay during the middle and late stages. In summary, the Rainbow reinforcement learning algorithm proposed in this study can effectively improve the learning performance of a single deep reinforcement learning neural network architecture. In the two StarCraft training tasks, the model convergence can be completed in a shorter training period and the scoring bottleneck of the training tasks can be broken through

並列關鍵字

AI ； Deep Learning ； Reinforcement Learning ； Strategy Game

參考文獻

[1] Y. Li, "Deep reinforcement learning: An overview," arXiv preprint arXiv:1701.07274, 2017.

Google Scholar

[2] C.J. Watkins, P. Dayan, "Q-learning," Machine learning, vol.8, pp.279-292, 1992.

Google Scholar

[3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.

Google Scholar

[4] V. Mnih, K. Kavukcuoglu, D. Silver, et al. "Human-level control through deep reinforcement learning," Nature, vol.518, pp.529-533, 2015.

Google Scholar

[5] H.V. HASSELT, A. Guez, D. SILVER, "Deep reinforcement learning with double q-learning," Proceedings of the AAAI conference on artificial intelligence, vol.30, no.1, 2016.

Google Scholar

國際替代計量

使用強化學習技術於星海爭霸遊戲策略之研究

全文下載

主題瀏覽