透過您的圖書館登入
IP:18.118.160.215
  • 學位論文

使用強化學習技術於星海爭霸遊戲策略之研究

Research on StarCraft Game Strategy using Reinforcement Learning Technologies

指導教授 : 張元翔

摘要


近期,強化學習已成為人工智慧技術發展的重要趨勢之一。本研究主要基 於 Rainbow 強化學習演算法並應用於星海爭霸遊戲策略,藉由 Rainbow 演算法 之融合架構手法解決並改進單一結構的模型效能。 本研究主要混合使用深度學習演算法和強化學習演算法,包含:Q-Learning、 DQN、Double-DQN、Dueling-DQN、Prioritized Experience Replay、Noisy-Net、 Distributional、Multi-Step 等演算法,並將以上演算法集之大成為 Rainbow 演算 法,於兩項星海爭霸之迷你地圖任務進行訓練且進行策略研究及效能改進。研 究結果指出,使用 Rainbow 演算法的結構相較於 DQN 演算法的結構提升約 25%的效能;除此之外,利用已知的策略能使模型能有效地完成任務,於訓練 前期階段強化探索能力,中、後期階段利用記憶經驗回放不斷地修正模型。總 結而言,本研究提出的 Rainbow 強化學習演算法能有效改善單一深度強化學習 神經網路結構的學習效能,於兩項星海爭霸訓練任務中,可以在更短的訓練週 期內完成收斂並突破訓練任務的得分瓶頸。

並列摘要


Recently, reinforcement learning has become one of the important trends in the development of artificial intelligence.This study is mainly based on the Rainbow reinforcement learning algorithm and applied to the StarCraft game strategy, also solves and improves the model performance of a single structure through the fusion architecture of the Rainbow algorithm. This study mainly uses a mixture of deep learning algorithms and reinforcement learning algorithms, including: Q-Learning, DQN, Double DQN, Dueling-DQN, Prioritized Experience Replay, Noisy-Net, Distributional, and Multi-Step. Rainbow algorithm is a comprehensive expression of above, also training on two StarCraft mini-map missions and conduct strategy research and performance improvements.The experimental results of this study indicate that the architecture using Rainbow algorithm improves the performance about 25% compared to the architecture of DQN algorithm. In addition, the method by using the known strategies can enable the model to effectively finish the training task, strengthen the exploration ability during the early stage of model training, and continuously modify the model by using memory experience replay during the middle and late stages. In summary, the Rainbow reinforcement learning algorithm proposed in this study can effectively improve the learning performance of a single deep reinforcement learning neural network architecture. In the two StarCraft training tasks, the model convergence can be completed in a shorter training period and the scoring bottleneck of the training tasks can be broken through

參考文獻


[1] Y. Li, "Deep reinforcement learning: An overview," arXiv preprint arXiv:1701.07274, 2017.
[2] C.J. Watkins, P. Dayan, "Q-learning," Machine learning, vol.8, pp.279-292, 1992.
[3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[4] V. Mnih, K. Kavukcuoglu, D. Silver, et al. "Human-level control through deep reinforcement learning," Nature, vol.518, pp.529-533, 2015.
[5] H.V. HASSELT, A. Guez, D. SILVER, "Deep reinforcement learning with double q-learning," Proceedings of the AAAI conference on artificial intelligence, vol.30, no.1, 2016.

延伸閱讀