透過您的圖書館登入
IP:3.145.62.72
  • 學位論文

利用反向增強式學習法實現示範模仿學習

Imitation Learning Based on Inverse Reinforcement Learning

指導教授 : 余國瑞 黃國勝
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在本論文中,我們希望能學習或是模仿出專家的行為。透過回饋函數(Reward Function)R是其中一種方式,其優點在於回饋函數R具有簡潔的表達、容忍錯誤的能力和轉移到其他任務的彈性。 我們探討在增強式學習中,在某些情況底下,回饋函數R難以用手動的方式做調整,需要找尋一種方式自動幫我們做調整。反增強式學習法的方式因應而生,藉由觀察專家示範的行為,使我們可以找出一組回饋函數R和一組策略π。其中,反增強式學習法是透過回饋函數R學習到的策略π和專家示範行為做比較,藉此修正回饋函數R,透過反覆調整,直到學習出來的策略π可以執行出專家示範的行為。 這個比較的過程與分類的想法類似,因此加入了AdaBoost的概念,透過對誤差值與權重值間的關係,藉以幫助我們可以較快找出適合的回饋函數,同時並獲得與專家示範相同的行為。

並列摘要


In this thesis, we hope we can learn or imitate the behaviors of expert. Using reward function to learn the behaviors is one of the ways to get the goal. The advantage of reward function is the most succinct, robust, and transferable definition of the task. We discuss the reinforcement learning. In some kings of the situation, it is hard to manually tweak the reward function. How to get an approach to automatically adjust the reward function is important. Therefore, Inverse reinforcement learning is the solution to solve the problem. Inverse reinforcement learning is using the reward function to learn the policy π. Use the policy π to compare with the demonstrations of expert and we can get some information to adjust our reward function. Repeatedly the above procedures until we get the policy which can make what expert do. The compare measure is like a classification. Therefore, we try to add the concept of AdaBoost which is the way to make some weak classifiers to a strong one. To help our Inverse Reinforcement Learning not only learn more fast but also get the reward function and the policy which can chose the action the expert want.

參考文獻


[3] N. D. Ratliff, D. Silver, J. A. Bagnell, “Learning to Search: Functional Gradient Techniques for Imitation Learning,” Autonomous Robots, Vol. 27, pp.25-53, 2009.
[4] D. A. Pomerleau, “Efficient Training of Artificial Neural Networks for Autonomous Navigation,” Neural Computation, Vol. 3, pp. 88-97, 1991.
[5] G. Z. Grudic, P. D. Lawrence, “Human-to-robot Skill Transfer Using the Spore Approximation,” Processing of the 1996 IEEE International Conference on Robotics and Automation, Vol. 4, pp. 2962-2967, 1996.
[7] P. Abbeel, A. Y. Ng, “Apprenticeship Learning via Inverse Reinforcement Learning,” Proceeding of the 21st International Conference on Machine Learning, pp.1-8, 2004.
[8] P. Stone, R. S. Sutton, G. Kuhlmann, “Reinforcement Learning for RoboCup-Soccer Keepaway,” Adaptive Behavior, Vol. 13, pp. 165-188, 2005.

延伸閱讀