透過您的圖書館登入
IP:18.189.180.244
  • 學位論文

應用於辦公室情境之人機互動:以互動強化方式達成機器人行為調適之學習

Towards Human-Robot Interaction in Office Context:Learning to Adjust Robot Behaviors from Human Reinforcement

指導教授 : 傅立成
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


並列摘要


This thesis proposes an approach that is applied to human-robot interaction domain to learn the user needs and preferences and adjust robot behaviors. Accordingly, as robots are put into use in humans’ daily life, the assigned tasks to robots are miscellaneous, and the quantity of people to be interacted by robots is immense. As a result, when facing different users, it is important for robots to personalize the interactions and provide user-desired services. For occupational people, staying at working places takes a large amount of time. In this thesis, we study the service robot with application to the office environment. The research content in this work is distinct from traditional machine learning. In human-robot interaction, the training data can be collected only or largely from real experiments. Besides, different individuals possess different preferences, and his/her preferences may even vary with many internal or external factors. Last but not the least, natural human communication and interactive behaviors add additional uncertainties to the learning of robots. This thesis has three principal contributions. First, we propose an approach under which the robot adjusts its behaviors to adapt to user preferences while it is interacting with users. The method of action selection can effectively explore actions based on past human responses. Moreover, the method of approximating the reward functions and transition functions are especially designed for human-robot interaction. Second, due to the fact that human preference can vary and reactions to the same robot behaviors are different from person to person, the rewards produced from the pre-constructed model should be modified online. To achieve this, we examine the correlation between the robot action and the human response, and then fine-tune the reward of the predictive model for adaptive learning. Third, natural human responses and the human’s interaction with the environment are considered in our work. In this way, the learning efficiency can be enhanced and the required human efforts for robot learning can be reduced.

參考文獻


[1] M. Goodrich and A. Schultz, “Human-robot interaction: a survey,” in Foundations and Trends in Human-Computer Interaction, 2007.
[3] A. Koku, A. Sekmen, and A. Alford, “Towards socially acceptable robots,” in Proceedings of 2000 IEEE International Conference on Systems, Man and Cybernetics, pages 894-899, 2000.
[4] T. Kuriyama and Y. Kuniyoshi, "Co-creation of human-robot interaction rules through response prediction and habituation/dishabituation," in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), St. Louis, MO, USA, pages 4990-4995, October 11-15, 2009.
[6] R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA 1998, ISBN 0-262-19398-1.
[7] P. Kormushev, B. Ugurlu, S. Calinon, N. Tsagarakis, and D. G. Caldwell, "Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization," in Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), San Francisco, USA, pp. 318-324, September 2011.

延伸閱讀