端到端使用深度強化式學習訓練賽車遊戲AI機器人

我們提出一種純End-to-end訓練賽車遊戲AI機器人的深度增強式學習(deep reinforcement learning)方法，只利用遊戲畫面中提供的速度資訊來進行訓練，不使用任何從遊戲內部拿出其他的內部資訊如賽車面向角度等，並透過自我學習超越平均人類玩家水準。我們提出純速度組成之價值函數，並利用分散式訓練架構Ape-X結合Deep Q Network的變形來訓練，解決價值函數所提供的訓練訊號較稀疏的問題。另外，我們也提出限制學習者速度方法，大幅增加訓練速度與最終訓練成果。使用此方法訓練出來的AI機器人可以達到超越人類平均水準的表現，並且能夠達到接近專業級玩家的程度。

關鍵字

深度學習；深度強化式學習；電玩遊戲機器人；端到端學習；賽車遊戲

並列摘要

We propose a pure end-to-end deep reinforcement learning for training car racing game AI bot that uses only the velocity information extracted from screen for both training and testing phases without using any internal state from game environment, such as the car facing angle. The learned AI bot can play better than the average performance of human players. In our approach, we design the reward function consisting only the velocity value, and use Ape-X distributed training framework combined with a variant of Deep Q Network to solve the sparse training signal problem caused by the reward function we designed. Moreover, we propose limit learner rate method that improves the training efficiency and training performance. The AI bot trained in this way can achieve performance beyond the average human level and reach a level close to professional players.

並列關鍵字

Deep learning ； Deep reinforcement learning ； Video game AI bot ； End-to end learning ； Car racing game

參考文獻

[1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis, Human-level control through deep reinforcement learning, Nature 2015

Google Scholar

[2] Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver,Distributed Prioritized Experience Replay, ICLR 2018.

Google Scholar

[3] Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ICML 2016.

Google Scholar

[4] Etienne Perot, Maximilian Jaritz, Marin Toromanoff, Raoul De Charette, End-to-End Driving in a Realistic Racing Game with Deep Reinforcement Learning, CVPR 2017 workshop.

Google Scholar

[5] Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver, Rainbow: Combining Improvements in Deep Reinforcement Learning, arXiv preprint arXiv:1710.02298.

Google Scholar

國際替代計量

端到端使用深度強化式學習訓練賽車遊戲AI機器人

全文下載

主題瀏覽