透過您的圖書館登入
IP:18.222.42.70
  • 學位論文

自示範中學習結合與行動相關聯之資料調整法生成適用於動態場景之反應式行動法則

Imitation Learning with Action-correlated Data Arrangement for Generating Reactive Action Policies in Dynamic Scenes

指導教授 : 周承復 王傑智

摘要


近年來,研究學者們開始運用機器學習的技巧來降低調整參數以及設計各式規則的麻煩性,對機器人學而言,使用這類學習技巧的目的便是讓機器人能從資料中學得運動法則所需之各式參數;藉由自資料中學習的概念以及研究學者發現人類族群會利用教導與學習的過程來獲得新技能之事實,機器人自示範中學習 (robot learning from demonstration) (亦稱模仿學習)的概念開始受到重視。 而在模仿學習 (imitation learning) 演算法發展的過程中,其研究重點從機器人模仿人類以完成任務的理論與方法上之設計,轉變成如何用適當的方法來表示欲模仿、學得之項目,在此基礎上,一個具代表性的特徵組以及學習模型對於學習的過程而言是不可或缺的。在本論文中,主要貢獻之一便是展示出使用行動特徵以及高階資訊後對於學習的影響,在結合這些資訊以及基於此些行動後的未來狀態,發展出多步特徵 (multi-step feature) 並生成相對應的特徵向量,利用此特徵向量,將可學得示範者連續性的行為,由於所提出的特徵表示法與行動有極大的相關性,故稱之為行動相關聯資料 (action-correlated data)。 除了行動相關聯資料,此論文的主要貢獻為以下概念:藉由資料調整法並搭配互動式模仿學習的技巧生成具結構性的法則。在起始的示範以及學習過程之後,失敗的情境將啟動行動相關聯之資料調整法,在此概念下,將利用互動式學習法獲取額外的示範資料,而這些新取得之資料將用以訓練出新的法則,而非使用所有取得之資料重新訓練單一的法則;在只有起始示範資料且啟動互動機制相當耗費資源而不可得的情形下,所提出的學習機制將利用學習者 (learner) 以及示範資料的特性自動地將行動相關聯之資料進行調整,即使沒有使用者所提供的新示範資料,仍舊可以生成出具結構性的法則。 基於室內靜態環境以及模擬的動態環境下之實驗成果,足以驗證所提出之方法之成效,使用者的示範資料能夠被有效的學習以生成反應式行動法則 (reactive action policies),基於這些情境下的成功經驗,所提出之學習機制當可應用於自主式機器人的導航,讓機器人族群能更快地融入我們的日常生活之中。

並列摘要


In recent years, machine learning techniques are applied to reduce the burden of parameter tuning and rule designing processes in many applications. Utilizing these learning techniques, the main purpose is to let robots have the ability to learn parameters of policies from data. With the concept of learning from data, researchers are inspired by the fact that humans are capable of obtaining new skills from teaching and learning processes. As a result, the research field of robot learning from demonstration attracts attention. With developed imitation learning algorithms which theories and methods for robots imitating human subjects are studied, the focus becomes how to use a proper representation to describe the task to be imitated. A representative feature set and learning model are essential and should be chosen for the learning process. As part of the contributions in this thesis, effects of applying action feature and high level information are presented. Combing these information with future states based on actions to be selected, the proposed multi-step feature can be constructed to form a feature vector for the learner to train a policy for reproducing successive motion behaviors as demonstrators. Since the proposed feature representation is strongly correlated with actions, it is referred as action-correlated data in this thesis. Besides action-correlated data, the main contribution in this thesis is the concept of an arrangement procedure for generating a structured policy with interactive imitation learning techniques. The arrangement of action-correlated data will be applied for those failed cases after the initial demonstration and training processes. With this concept, additional demonstrations should be acquired through the interactive learning process and these newly obtained examples will be applied to train new policies instead of retraining a single policy using all the collected data. For those situations only the initial demonstration is provided and interactive processes do cost heavily, utilizing the characteristic of the leaner and demonstrations, action-correlated data can still be automatically arranged with the proposed mechanism. A structured policy will be generated accordingly for such particular situation even without additional demonstrations from human subjects. Based on the experimental results in indoor static environments and simulated dynamic environments, it is shown that the proposed method is capable of generating reactive action policies based on demonstrations from human subjects. With successful experiences in these scenarios, it is encouraged that the proposed mechanism should be applied for autonomous robots to complete the navigation in the daily life.

參考文獻


Yu, C.-C. & Wang, C.-C. (2014). Multi-step learning to search for dynamic environment navigation. Journal of Information Science and Engineering, 30, 637–652.
Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research, 29(13), 1608–1639.
Abbeel, P. & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, (pp.˜1). ACM.
Althoff, D., Kuffner, J. J., Wollherr, D., & Buss, M. (2012). Safety assessment of robot trajectories for navigation in uncertain and dynamic environments. Autonomous Robots, 32(3), 285–302.
Atkeson, C. G.&Schaal, S. (1997). Robot learning from demonstration. In ICML, volume 97, (pp. 12–20).

延伸閱讀