我們提出一種純End-to-end訓練賽車遊戲AI機器人的深度增強式學習(deep reinforcement learning)方法,只利用遊戲畫面中提供的速度資訊來進行訓練,不使用任何從遊戲內部拿出其他的內部資訊如賽車面向角度等,並透過自我學習超越平均人類玩家水準。我們提出純速度組成之價值函數,並利用分散式訓練架構Ape-X結合Deep Q Network的變形來訓練,解決價值函數所提供的訓練訊號較稀疏的問題。另外,我們也提出限制學習者速度方法,大幅增加訓練速度與最終訓練成果。使用此方法訓練出來的AI機器人可以達到超越人類平均水準的表現,並且能夠達到接近專業級玩家的程度。
We propose a pure end-to-end deep reinforcement learning for training car racing game AI bot that uses only the velocity information extracted from screen for both training and testing phases without using any internal state from game environment, such as the car facing angle. The learned AI bot can play better than the average performance of human players. In our approach, we design the reward function consisting only the velocity value, and use Ape-X distributed training framework combined with a variant of Deep Q Network to solve the sparse training signal problem caused by the reward function we designed. Moreover, we propose limit learner rate method that improves the training efficiency and training performance. The AI bot trained in this way can achieve performance beyond the average human level and reach a level close to professional players.