透過您的圖書館登入
IP:44.199.212.254
  • 學位論文

基於環境與人類行為學習之機器人運動控制

Environment and Human Behavior Learning for Robot Motion Control

指導教授 : 王傑智

摘要


接近式圖表(Nearness Diagram)為機器人的運動控制提供了一個反應式的演算法,它使用一個決策樹(decision tree)來將環境分類成不同的狀況,並且用映射函數(mapping function)將環境狀況對應至控制命令。然而,此方法中使用的決策樹及映射函數是事先定義的,還需要調整許多參數,而且,其所產生出的路徑與人類的路徑並不相似。 模仿學習(Imitation Learning)則是一種讓機器人產生類似人類行為的方法,它是基於馬可夫決策過程(Markov decision process),試著從使用者的控制行為找出獎勵函數(reward function),而此獎勵函數則用來產生模仿人類行為的控制命令。然而,對一般的使用者而言,真正在其心中的獎勵函數是難以描述的,因此,我們在比較真正的及學習得到的獎勵函數上是有困難的。 在此篇論文中,我們結合了接近式圖表和模仿學習兩個方法。我們並不使用事先定義好的決策樹來分類環境,也不試圖將獎勵函數解出,我們將試著找到環境資訊與人類控制行為的對應關係。 以下簡單描述我們的系統:首先,使用者會被要求控制機器人,環境的資訊以及使用者的控制資料會被收集來當作訓練資料。接著使用多重平均數法(the K-means method)將這些資料分類成不同的狀況,例如直線或者是轉彎。我們提出一個類似於尺度恆常特徵轉換(Scale-Invariant Feature Transform, SIFT)的時間特徵來標出這些狀況並且試圖移除雜訊。針對每一個狀況,使用適應性促進 (Adaptive Boosting, AdaBoost)演算法來產生一個分類器。最後,我們提出一個最近相鄰點(nearest neighbor)控制器來產生控制命令。

關鍵字

環境 人類行為 學習 機器人 控制

並列摘要


THE Nearness Diagram (ND) method provides a reactive algorithm for robot motion control. It uses a decision-tree to classify the environment into several situations. A mapping function is used to generate the control commands from the situations. However, the decision-tree and the mapping function are pre-defined and many parameters need to be manually tuned. Besides, the generated path is not humanlike. The imitation learning method is an approach that aims to make robot behave as a human. It is based on the Markov decision process (MDP) which is a framework for modeling the environment. In the imitation learning method, it tries to extract the reward function in MDP under given human’s control behavior. Then, the reward function is used to generate the control command which imitates human’s behavior. Unfortunately, the true reward functions in their mind are hard to describe for general users. Thus, we have difficulty on comparing the learned reward function and the ground truth. In this thesis, we combined the ND method and the imitation learning method. We do not use a pre-defined decision tree to classify the environment in the ND method. Also, we do not solve the reward function in the imitation learning method. Instead, we try to find a mapping from the environment information to the human’s control behavior. Our system is simply described below. Several users are asked to control the robot at first. Then, the environment information and users’ control data are gathered as training data. The incremental K-means method is used to classify the training data into different situations, such as straight or turns. We use the concept of scale-invariant feature transform (SIFT) in computer vision. A SIFT-like temporal feature is proposed to mark the different situations and try to eliminate noise. The Adaptive Boosting (AdaBoost) algorithm is applied to train one classifier for each situation. Finally, a nearest neighbor controller is proposed to generate the control command.

並列關鍵字

environment human behavior learning robot control

參考文獻


Ratliff, N., Bradley, D., Bagnell, J., and Chestnutt, J. (2007). Boosting structured prediction for imitation learning. In Advances in Neural Information Processing Systems, Vancouver, B.C., Canada.
Bellman, R. (1957). A markovian decision process. Journal of Mathematics and Mechanics, 6, 679–684.
Green, D. and Swets, J. (1966). Signal detection theory and psychophysics. New York: John Wiley and Sons Inc.
Hartigan, J. A. and Wong, M. A. (1979). A k-means clustering algorithm. Applied Statistics, 28(1), 100–108.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 2, 91–110.

延伸閱讀