透過您的圖書館登入
IP:52.15.218.103
  • 學位論文

基於骨架的多分支三維卷積網路之人體動作辨識

Skeleton-based Action Recognition using Multi-stream 3D Convolutional Neural Networks

指導教授 : 蔡文錦 陳華總

摘要


近年來,人體動作辨識可被應用在許多領域,像是居家照顧、監控系統以及人機互動,而人體骨架資訊不會輕易地被光線變化所影響,且較可以描繪出人體的輪廓,這些因素使得基於骨架的人體動作辨識正蓬勃發展。 本論文提出一個多分支架構來辨識基於骨架的人體動作,首先,將抓到的骨架座標正規化,以解決多變化視角的問題。接下來,我們對已正規化的三維骨架資訊上作三種不同的骨架視覺化,進一步將三維的資訊轉成二維的影像串列,這些利用不同方法產生的二維影像彼此之間會互補。接著,我們將這些影像輸入進獨立的三維卷積神經網路來做分類,最後融合不同分支的結果來得到最終的預測分類。我們使用NTU RGB+D資料庫進行實,其結果說明我們所提出的方法能有非常好的結果。

並列摘要


In recent years, skeleton-based action recognition has been widely developed since action recognition can be applied on home caring system, intelligence surveillance, and human-computer interaction. Compared to RGB data, skeletal data is more insensitive to illumination changes and more reliable to estimate body silhouettes. In this thesis, we propose a multi-stream architecture to handle skeleton-based action recognition. First, the fetched skeletal coordinates are regularized to resolve the view-variant problem. Second, the 3D regularized skeleton data are transformed to 2D image sequences by using proposed visualization approaches: skeleton visualization, part decomposition, and view composition. The images generated by different visualization methods are complementary to each other. Third, each type of images has one dedicated 3D ConvNets which utilizes both spatial and temporal features for human action recognition. The final result depends on the score fusion of multiple 3D ConvNets. Experiments conducted on NTU RGB+D dataset demonstrate the superiority of our proposed method.

參考文獻


[1] E. Cippitelli, S. Gasparrini, E. Gambi, and S. Spinsante, “A Human Activity Recognition System Using Skeleton Data from RGBD Sensors,” Computational Intelligence and Neuroscience, vol. 2016, Article ID 4351435, 14 pages, 2016.
[2] D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, "Learning Spatiotemporal Features with 3D Convolutional Networks," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 4489-4497.
[3] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding”, arXiv: 1408.5093, 2014.
[4] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-Scale Video Classification with Convolutional Neural Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 1725-1732.
[5] P. Wang, Z. Li, Y. Hou and W. Li, “Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks,” ACM Multimedia, pp.102-106 2016.

延伸閱讀