透過您的圖書館登入
IP:3.14.6.194
  • 學位論文

以合成強化式學習之適應行為學習之社交機器人導航

Adaptive Behavior Learning Social Robot Navigation with Composite Reinforcement Learning

指導教授 : 傅立成

摘要


對於服務型機器人,只考慮某些條件,例如:最短路徑的導航是不夠的。在人機共存的環境中,機器人除了考慮這些條件外,也要讓人認為他的導航是足夠自然的。為了使機器人遵守’社交規則’,使用機器學習的方式使機器人學會社交導航比繁瑣的由研究人員設計特徵來的適合。最近,深度增強式學習開始導入機器人研究領域,然而還很少研究考慮到用此學習架構來解決社交導航問題。社交導航是一個高維度的問題。為了解決這些問題,本研究提出合成強化式學習以提供一架構使機器人能由感測器輸入去學習出如何產生適當的速度。本系統使用深度強化式學習來學習特定場景之機器人之社交導航速度。藉由獎勵更新模組,人們可以提供回饋給機器人。為了使我們的系統更一般化,我們不使用模擬或是提前蒐集的資料。因為他們缺少了機器人與人在真實環境中的互動。我們直接將我們的系統導入真實空間,並提出方法以人類之先備知識來解決深度增強式學習過於長的學習時間。我們的系統可以逐漸學習如何控制機器人的速度在某個特定的條件下並且藉由人們的回饋來調整條件已了解當時的社交規則。由實驗證明,我們提出的合成強化式學習可以學會如何社交導航並且於合理時間內學會。獎勵得更新更使我們的系統能學到更合適的導航行為

並列摘要


For service robot, the navigation movement that only considers the metrics such as minimum path is not enough. In the environment that robot and human coexist, the robot not only needs to consider such metric but also to let the human think its navigation movement is natural enough. In order to following such ’social norms’ in the environment, using learning method to make robot learn how to navigate is easier than tediously designing handcrafted rules. Recently, deep reinforcement learning (DRL) is applied to the robotic field. However, there are very few researchers who consider solving the social navigation problem, which is in a high dimensional space by applying DRL method. In order to solve these problems, the research proposes the composite reinforcement learning (CRL) system that provide a framework that use the sensor input to learn how to generate the velocity of the robot. The system uses DRL to learn the velocity in a given set of scenarios and a reward update module that provides ways of updating the reward function based on the feedback of human. In order to generalize the system, we don’t use simulator or pre-collected data that are in lack of the real interaction between human and robot. We directly apply our system to the real environment and provide methods to cope with the long training time problem of DRL in real environment by incorporating prior knowledge to the system. The CRL system is able to incrementally learn to determine its velocity by a given rules (e.g. reward functions). Also it will keep collecting human feedback to keep synchronizing the reward functions inside the system to the current social norms. The experiments show that the proposed CRL system can learn how to navigate in reasonable time. The updating reward is able to make the system learn a more suitable navigation style.

參考文獻


[1] G. Ferrer and A. Sanfeliu. Multi-objective cost-to-go functions on robot navigation in dynamic environments. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3824–3829.
[3] P. Ratsamee, Y. Mae, K. Ohara, M. Kojima, and T. Arai. Social navigation modelbased on human intention analysis using face orientation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1682–1687.
[4] M. Kollmitz, K. Hsiao, J. Gaa, and W. Burgard. Time dependent planning on a layered social cost map for human-aware robot navigation. In Mobile Robots (ECMR), 2015 European Conference on, pages 1–6.
[5] Dirk Helbing Moln´ar and P´eter. Social force model for pedestrian dynamics. Phys. Rev. E 51, 1998.
[7] C. Weinrich, M. Volkhardt, E. Einhorn, and H. M. Gross. Prediction of human collision avoidance behavior by lifelong learning for socially compliant robot naviga-tion. In 2013 IEEE International Conference on Robotics and Automation, pages 376–381.

延伸閱讀