透過您的圖書館登入
IP:3.135.221.17
  • 學位論文

基於深度強化學習之雙臂機器人的自碰撞避免與運動控制

Self-Collision Avoidance and Motion Control for Dual-Arm Robot Based on Deep Reinforcement Learning

指導教授 : 翁慶昌 劉智誠
本文將於2024/08/28開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


本論文提出了一個基於深度強化學習之雙臂機器人自碰撞避免的方法和一個有效的訓練方式,使得具有兩個7自由度手臂之雙臂機器人能夠有效地避免自碰撞、關節極限與奇異點等。主要包含兩個部分:(1)雙臂機器人之運動控制以及(2)基於深度強化學習之自碰撞避免。在雙臂機器人之運動控制部分,本論文提出一個運動學解析解方法,透過假設一組虛擬的三連桿,藉由正運動學與幾何方法來獲得七自由度冗餘手臂之逆運動學的解。此外,在工作空間之線性軌跡規劃中,本論文依據正向與反向的球面線性插值來產生兩種姿態軌跡,並提出一個擇優執行的方法來選擇姿態軌跡,從而可以改善以往單一軌跡在運動過程可能會超出手臂之關節極限的問題。在基於深度強化學習之自碰撞避免部分,本論文使用一個基於連桿資訊的碰撞偵測方法,以及使用Gazebo模擬器建構了一個3D動態模擬環境來訓練神經網路。雙臂機器人之左臂和右臂各自有一個神經網路來控制末端點移動到期望的位置與姿態。本論文使用柔性行動者評論家演算法來同時訓練這兩個神經網路,並且將一個手臂視為另一個手臂的訓練環境,以降低訓練環境的複雜度。在所提方法之神經網路的輸入部分,不需要使用額外的感測器,僅需使用運動學所提供之各軸角度與各關節位置的資訊。所提方法能夠有效減少資料處理的運算量,並且更容易地移植至不同的雙臂機器人系統。藉由一些適當之神經網路輸入以及獎勵函式的設計,使得機器人能夠完成預期之自碰撞避免以及有效地避免手臂的關節極限與奇異點。在實驗部分,本論文隨機產生1,000組手臂的初始位置與目標位置來驗證所提出的方法在避免自碰撞、關節極限與奇異點等方面是有效的。

並列摘要


In this thesis, a self-collision avoidance method based on deep reinforcement learning(DRL), and an effective training method are proposed to let a dual-arm robot with two 7-degree-of-freedom (7-DOF) arms can effectively avoid self-collision, joint limit and singularity. There are two main parts: (1) Motion control of the dual-arm robot and (2) Self-collision avoidance based on DRL. In the motion control part of the dual-arm robot, a kinematics analytical solution method is proposed. The forward kinematics and the geometric method are applied to obtain the inverse kinematics solution of the 7-DOF redundant arm by assuming a set of virtual three-links. In addition, in the linear trajectory planning of the workspace, two attitude trajectories are produced by the linear spherical difference value between the forward and backward. An optimal execution method is proposed to choose an attitude trajectory so that it can improve the problem that the previous single trajectory may exceed the joint limit of the arm during the movement process. In the self-collision avoidance part based on DRL, a collision detection method based on the link information is used and the Gazebo simulator is used to construct a 3D dynamic simulation environment to train neural networks. The left and right arms of the dual-arm robot each have a neural network to control the end-effector of the arm to move to the desired position and orientation. The algorithm of Soft Actor-Critic (SAC) is applied to train these two neural networks at the same time. Moreover, one arm is treated as the training environment of the other arm to reduce the complexity of the training environment. In the neural network input part of the proposed method, no additional sensors are needed. Only the information of each axis angle and joint position provided by kinematics is used. The proposed method can effectively reduce the amount of data processing and is easier to transplant to different dual-arm robot systems. With some appropriate neural network input and reward function design, the robot can perform the expected self-collision avoidance and effectively avoid the joint limit and singularities of the arm. In the experimental part, 1,000 sets of initial position and target position of the arm are randomly generated to verify that the proposed method is effective in the avoidance of self-collision, joint limit, and singularity.

參考文獻


[1] J.F. Engelberger, Robotics in Service, MIT Press, 1989.
[2] ABB YUMI, URL: irb-14000-yumi
[3] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” in Proc. Advances Neural Inf. Process. Syst., pp.1106-1114, 2012.
[4] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, and G. Ostrovski, “Human-Level Control through Deep Reinforcement Learning,” Nature, vol. 518, pp. 529-533, 2015.
[5] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv:1509.02971v5, 2015.

延伸閱讀