結合長短期記憶模型與近端策略優化為基礎之策略增強式學習

隨著人工智慧相關的研究興起，許多機器學習的技術漸漸地發展成熟，並相繼地被應用在各個領域上，然而遊戲領域在此情況下，卻仍有極大的發展空間，其原因在於，遊戲的複雜性。代理者的一個動作可能會造就許多種不同情況，這不但使模型複雜度大增，且訓練時間也更長。因此本研究提出了一種結合長短期記憶模型與近端策略優化的策略增強式學習（SEPPO），根據特徵來制定代理者策略，並通過結合長短期記憶模型，來對近端策略優化進行優化。我們可以利用策略判斷使增強式學習更快地達到相同的成效。SEPPO確認的遊戲領域方面的實驗結果表明，可以有效減少訓練時間過長的問題。

關鍵字

長短期記憶模型；近端策略優化；增強式學習

並列摘要

With the rise of research related to artificial intelligence, many machine learning technologies have gradually matured and have been applied in various fields one after another. However, in this case, the game field still has great room for development. The reason is the complexity of games. An agent's action may create many different situations, which not only greatly increases the complexity of the model, but also takes longer to train. Therefore, this study proposes a strategy-enhanced proximal policy optimization(SEPPO) that combines long short-term memory models with proximal policy optimization, formulates agent strategies based on features, and optimizes proximal policy optimization by combining long short-term memory models. We can use strategic judgment to make reinforcement learning achieve the same results faster. The experimental results confirmed by SEPPO in the field of games show that it can effectively reduce the problem of too long training time.

並列關鍵字

Long short-term memory model ； Proximal Policy Optimization ； Reinforcement Learning

參考文獻

[1] D. Silver, T. Hubert, J. Schrittwieser, "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm," in Science, 2018.

Google Scholar

[2] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, J. Quan, S. Gaffney, S. Petersen, K. Simonyan, T. Schaul, H. van, "StarCraft II: A New Challenge for Reinforcement Learning," in arXiv, 2017.

Google Scholar

[3] X. Wang, L. Gao, J. Song, H. Shen, "Beyond Frame-level CNN: Saliency-Aware 3-D CNN With LSTM for Video Action Recognition," in IEEE Signal Processing Letters, vol. 24, no. 4, pp. 510-514, 2017.

Google Scholar

[4] Z. Wu, X. Wang, Y.G. Jiang, H. Ye, X. Xue, "Modeling spatial-temporal clues in a hybrid deep learning framework for video classification," in Proceedings of the 23rd ACM international conference on Multimedia, pp. 461-470, 2015.

Google Scholar

[5] Q. Li, Z. Qiu, T. Yao, T. Mei, Y. Rui, J. Luo, "Action recognition by learning deep multi-granular spatio-temporal video representation," in Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 159-166, 2016.

Google Scholar

國際替代計量

結合長短期記憶模型與近端策略優化為基礎之策略增強式學習

全文下載

主題瀏覽