在動態環境中空間行為認知模型的學習

近期隨著機器人學的蓬勃發展，機器人已經從工業生產線走進人類的日常生活。不管是寵物型機器人、居家看護機器人或是導覽機器人，機器人未來必定會逐漸頻繁出現在人類的環境中，像是學校、辦公室、醫院、美術館、甚至在家庭中。而為了要使機器人能夠更加適應人類所處的環境，也為了提高人類對於機器人的接受度，機器人必須要了解人類行為與相對應環境的關係。換句話說，由於人類的行為因為受到如文化風俗、法律、甚至是心理狀態等隱晦因素的影響，而機器人若想要被人類接納，就必須入境隨俗，嘗試理解人類社會化的空間利用行為，並且遵守共同的社交規範。本論文的宗旨即在於發展「動態空間行為認知模型」。此模型可以教導機器人學習特定環境中或是無法直接從肉眼看到的社會行為，而此學習的結果也可以在接下來的運作中進一步作微調以達到適合環境的目的。機器人是透過逆向加強學習來模仿人類的行為。然而，因為每個人對世界的感知會因人而異，所以在同一環境下做出來的行為不一定會相同。因此，這篇論文會先用資訊熵將機器人所觀測到的行為分成好幾個狀態。機器人會學習各個狀態而能更精確表達所見的社會規範，此外，機器人也會在運作的時候修改之前學過的結果已達到適合動態環境的目的。此篇論文能讓機器人自己經由人類速度的變化推測出可能的社會行為差異，藉由三個速度階層(快、中、慢)的變化當作行動，使用資訊熵分類這些行動序列就可以讓機器人認知不同的社會行為(在論文中是以偏好來表示)，並加以學習。

關鍵字

資訊熵；行為理解；軌跡分群；行動式機器人；逆向加強學習

並列摘要

With the rapid development of robotics, robots have expanded their applications from industry and production lines to daily life. Beside servants, they can be pets, companions, or guides. In the near future, robots will appear in human environments, such as campuses, offices, hospitals, museums and even households. For robots to be useful, and to be accepted by humans, they need to understand human behaviors as well as to adapt to, and relate with their environments. Human behaviors, however, are highly affected by implicit human factors such as culture, social conventions, laws and even the mental states of individuals and groups. If robots are to be accepted by humans, they must conform to common social norms and local customs as well as recognize highly socialized spatial behaviors. The main concept of this thesis is to develop the Dynamic Spatial Behavior Cognition Model (Dynamic SBCM) of the robot. The model makes robots learn the specific, invisible rules in human society, and successively attune the learning result when robots are operated in the learned environment, or other similar environments. Robots use inverse reinforcement learning (IRL) to learn the behavior by apprenticing human behavior. However, the perception for everyone feeling the same environment may not be identical, so the different perception will cause the different action. The thesis separates actions into many states using information entropy. Robots learn each state to represent the social rule more precisely. Bedsides, the robot also need to modify the learned result when operating to adapt to the dynamic environment. The thesis includes a demonstration of a method of using the same learning approach to cluster trajectories by three velocity levels, slow, medium and fast, to describe the preference of human, and the corresponding cost function can predict the human preference.

並列關鍵字

Information Entropy, ； Behavior Understanding ； Trajectory Clustering ； Mobile Robot ； IRL

參考文獻

[1] P. Abbeel and A. Y. Ng, "Apprenticeship learning via inverse reinforcement learning," Proceedings of the twenty-first international conference on Machine learning, Banff, Alberta, Canada, pp. 1, 2004.

[4] J. Berclaz, F. Fleuret and P. Fua, “Robust People Tracking with Global Trajectory Optimization,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, pp. 744-750, 2006

[5] S. S. Blackman, "Multiple Hypothesis Tracking for Multiple Target Tracking," IEEE Aerospace and Electronic Systems Magazine, Vol. 19, No. 1, pp. 5-18, 2004.

[7] M. Boose and R. Zlot, “Map Matching and Data Association for Large-Scale Two-dimensional Laser Scan-based SLAM,” International Journal of Robotics Research Vol. 27, No. 6, pp. 667–691, 2008.

[8] S. Y. Chung and H. P. Huang, “A Mobile Robot that Understands Pedestrian Spatial Behaviors,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.5861-5866, Taipei, Taiwan, 2010.

國際替代計量

在動態環境中空間行為認知模型的學習

全文下載

主題瀏覽