透過您的圖書館登入
IP:3.129.209.141
  • 學位論文

以快慢雙流圖卷積神經網路架構實現骨架動作辨識

SlowFast-GCN: A Novel Skeleton-Based Action Recognition Framework

指導教授 : 林政宏
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文討論骨架動作辨識任務,此任務在過去的論文中較少討論到時間特徵的學習,大多研究如何學習到更好的空間特徵,而就過去在動作辨識任務中的經驗,時間維度對於動作辨識任務的影響是巨大的,因此我們聚焦在時間維度對此任務之影響,為此提出了一個雙流網路架構來融合不同時間尺度的輸入,以此方法來提取靜態與動態特徵,接著我們進一步針對圖卷積內部的鄰接矩陣作改良,將其設計為可以針對不同時間時間區段學習,進而學習到更精準的骨架相關性,從實驗結果可以得知,混和不同時間尺度特徵可以有效增加準確率,在NTU RGB+D能夠到達94.8%的準確率,經過改良鄰接矩陣後更是能到達95.2%的準確率,由此可以驗證,時間尺度上的特徵對於骨架動作辨識任務是相當重要的。

並列摘要


This thesis discusses skeleton-based action recognition tasks. In the past, most researches on this task have studied how to learn better spatial features, and seldom discussed the learning of temporal features. However, based on our experience in action recognition tasks, the features in the time dimension have a huge impact on the accuracy of the action recognition tasks. Therefore, we focus on the impact of the features in the time dimension on this task, and propose a two-stream network, called SlowFast-GCN to extract static and dynamic features simultaneously and fuse features of different time scales. Then we further improve the adjacency matrix inside the graph convolution to learn the characteristics of different time periods, and then learn more accurate skeleton correlation. Experimental results show that mixing features of different time scales can effectively increase the accuracy of action recognition. The proposed SlowFast-GCN achieves 94.8% accuracy on NTU RGB+D. After improving the adjacency matrix, it can reach an accuracy of 95.2%. The results show that the temporal features are very important for the task of skeleton-based action recognition.

參考文獻


[1] D.C. Van Essen, and J.L. Gallant, “Neural mechanisms of form and motion processing in the primate visual system,” Neuron, Vol. 13, Issue 1, pp. 1-10, 1994.
[2] E.A. DeYoe, and D.C. Van Essen, “Concurrent processing streams in monkey visual cortex,” Trends in Neurosciences, Vol. 11, Issue 5, pp. 219-226, 1988.
[3] J. Liu, A. Shahroudy, M. Perez, G. Wang, L. -Y. Duan and A. C. Kot, “NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding,” IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), vol. 42, no. 10, pp. 2684-2701, 1 Oct. 2020.
[4] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[5] E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden, “Pyramid methods in image processing,” RCA Engineer, vol. 29, no. 6, pp. 33–41, 1984.

延伸閱讀