透過您的圖書館登入
IP:3.16.218.62
  • 學位論文

適用於室內移動式機器人之人體動作辨識系統

A Human Action Recognition System for Indoor Mobile Robots

指導教授 : 方瓊瑤

摘要


本研究提出了一種基於視覺的人體動作識別系統,該系統採用深度學習技術。當攝影機從各個方向朝人物前進時,本系統能夠成功進行人體動作辨識。本研究所提出的方法,有助於陪伴型機器人的視覺系統。 本系統使用三種資訊進行人體動作辨識,包含彩色影像序列、光流序列及深度影像序列。首先,使用Kinect2.0中的深度感測器及彩色攝影機,同時捕捉彩色影像序列及深度影像序列。接著從彩色影像序列中擷取出HOG特徵,再使用SVM分類器來檢測人物區域。 透過檢測到之人物區域對彩色影像進行裁剪,並使用Farnebäck所提出的方法對裁剪出的彩色影像序列計算出對應的光流序列。然後透過frame sampling的技術,將序列裁剪為相同長度。接著將frame sampling的結果,分別輸入至三個改進後的3D convolution neural network(3D CNN)中。3D CNN可以擷取時間與空間中的人體動作特徵,並進行人體動作辨識。最後將三種辨識結果整合後,輸出最終的人體動作辨識結果。 本研究所提出的系統可以辨識13種人體動作,分別為坐著喝水、站著喝水、坐著吃東西、站著吃東西、使用手機、讀書、坐下、起立、使用電腦、走路(水平)、走路(垂直)、走離對方及走向對方。本系統在攝影機移動下之人體動作辨識率達到96.4%,表示本研究所提出之系統是穩定且有效的。

並列摘要


This study presents a vision-based human action recognition system using a deep learning technique. The system can recognize human actions successfully when the camera of the robots is moving toward the serviced person from various directions. Therefore, the proposed method is useful for the vision system of the indoor mobile robots. The system uses three kinds of information to recognize the human actions, including color videos, optical flow videos, and depth videos. First, a Kinect 2.0 captures color videos and depth videos simultaneously using its RGB camera and depth sensor. Second, the histogram of oriented gradient (HOG) features is extracted from the color videos and a support vector machine (SVM) is used to detect the human region. Based on the detected human region, the frames of color video are cropped and the corresponding frame of the optical flow video can be obtained by Farnebäck method. The number of frames of these videos is then unified by a frame sampling technique. After frame sampling, these three kinds of videos are input into three modified 3D convolutional neural networks (3D CNN) respectively. The modified 3D CNNs can extract the spatial and temporal features of human actions and recognize them respectively. Finally, these recognition results are integrated to output the final recognition result of human actions. The proposed system can recognize 13 kinds of human actions, including drink (sit), drink (stand), eat (sit), eat (stand), read, sit down, stand up, use computer, walk (horizontal), walk (vertical), play with phone/tablet, walk apart from each other, and walk towards each other. The average human action recognition rate of 369 testing human action videos was 96.4%, indicating that the proposed system is robust and efficient.

參考文獻


[Car17] J. Carreira and A. Zisserman, “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset,” Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, Honolulu, USA, pp. 6299-6308, 2017.
[Chu16-1] S. Y. Chun and C. S. Lee, “Human Action Recognition Using Histogram of Motion Intensity and Direction from Multiple Views,” IET Computer Vision, vol. 10, no. 4, pp. 250-256, 2016.
[Chu16-2] C. H. Chuan, Y. N. Chen, and K. C. Fan, “Human Action Recognition Based on Action Forests Model Using Kinect Camera,” Proceedings of 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), Crans-Montana, Switzerland, pp. 914-917, 2016.
[Dal05] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, pp. 886-893, 2005
[Du15] Y. Du, Y. Fu, and L. Wang, “Skeleton Based Action Recognition with Convolutional Neural Network,” Proceedings of IAPR Asian Conference on Pattern Recognition, Kuala Lumpur, Malaysia, pp. 579-583, 2015.

延伸閱讀