基於深度強化學習之無人載具對未知環境的路徑規劃

本論文應用深度強化學習於模擬環境中的無人載具，嘗試透過深度強化學習解決傳統路徑規劃方法所遇到的瓶頸。主要分為兩個部分：(1)已知環境資訊的Q學習、(2)未知環境資訊的深度Q學習。在已知環境資訊的Q學習，將代理人於模擬環境中的座標作為狀態，經過Q學習的模擬後，建立Q表並找出最佳路徑。在未知環境資訊的深度Q學習透過PyTorch建立神經網路，將代理人的影像作為狀態經過簡單的CNN神經網路與Q網路的計算得出最佳動作，在一定次數的訓練疊代後找出能達成最佳路徑的價值函數。深度Q學習在訓練過程中利用經驗回放(Experience Replay)隨機抽取以前的經驗並進行學習。除了經驗回放外，深度Q學習透過建立兩個結構相同但參數不同的神經網路，修正Q值的目標值(Fixed Q-targets)，此兩種方法可以打亂訓練的相關性，使神經網路更新更有效率。實驗結果上，本論文分別測試在模擬環境與實際場域的實用性，利用在模擬環境執行深度學習(Reinforcement Learning)能夠減少大量的人力資源並驗證在實際場域上，證明無人載具可以依靠模擬軟體中訓練出來的權重檔，在實際場域上能自動避障並且獲得與模擬相同的控制成果，最終成功到達目的地。

關鍵字

深度強化學習；路徑規劃； Gazebo模擬器；機器人操作系統

並列摘要

This thesis applies deep reinforcement learning to autonomous cars in a simulated environment and tries to solve the bottleneck encountered by traditional path planning methods through deep reinforcement learning. The thesis is divided into two parts: (1) Q-learning of known environment information, and (2) Deep Q-Learning Network of unknown environment information. In Q-learning of known environment information, the coordinates of the agent in the simulated environment are used as the state, and after Q-learning simulation, Q-table is built and the best path is found. In Deep Q-Learning Network with unknown environment information, we build a neural network through PyTorch, and use the agent's image as the state to compute the best action through a simple CNN neural network and Q network, and find the value function that can achieve the best path after a certain number of training iterations. Deep Q-Learning Network uses experience replay to randomly extract previous experiences and learn them during the training process. In addition to experience replay, Deep Q-Learning Network also builds two neural networks with the same structure but different parameters to modify the Q-targets, which can disrupt the correlation of training and make the neural network update more efficient. In this thesis, we tested the practicality of using Reinforcement Learning in a simulated environment and in a real-world setting, and verified the autonomous cars can rely on the weight files trained in the simulation software to avoid obstacles, and get the same control result as the simulation and finally reach their destinations successfully.