利用捲積神經網路進行動作辨識

多媒體在人類的生活中扮演重要的角色。有數以萬計的影片被上傳至網路。一些熱門的主題，像是籃球和棒球運動都有著極高的點閱率。因此資料擷取的技術逐漸變得重要。人類的動作辨識可以被近一步應用於異常事件偵測以及分析人類活動。在我們實驗中所使用到的資料庫裡，有包含像是人類身體的動作以及人類與物品之間的互動，像是跳躍，拍手和飲食。在這篇論文中，我們先利用捲積神經網路去訓練一個模型。然後擷取訓練及測試用影片的特徵。在取得這些特徵後，我們利用同一個影片中，特徵之間的時間關係去訓練一個三層的長短時間記憶模型。最後，我們選擇長短時間記憶模型的最後一層的最後一個時間步的特徵作為整個測試影片的特徵去分類。我們模型在測試之後的準確率高於一些近幾年來的方法。

關鍵字

動作辨識；深度學習；捲積神經網路；長短時間記憶；三維捲積核心

並列摘要

Multimedia plays an important role in human daily life. Hundreds of thousands videos are uploaded on the Internet. Some hot topic such as basketball and baseball games are with high click through rate so information retrieval techniques become important. Human action detection can be further applied to detect abnormal events and analyze activity. In this thesis, the dataset we use in experiments contains the human body action and interaction with objects like jumping, clapping, drinking. In the thesis, we first uses convolutional neural network (CNN) to train a model. Then extract the features of training and testing data from the model. After obtaining the features, we use the temporal information between features in same video clip to train a 3-layered long short term memory (LSTM) model. Finally, we choose the last layer feature vector of LSTM which contains all data characteristics of the testing video features as the determine scores. The results show that the accuracy of our structure is higher than some works proposed in recent years.

並列關鍵字

action recognition ； deep learning ； convolutional neural network ； long short term memory ； 3-D convolutional kernel

參考文獻

A. Feature point based materials

[2] Lowe, David G., Distinctive image features from scale-invariant key points, International Journal of Computer Vision 60.2 (2004), pp. 91-110.

[6] J. Weickert, A. Bruhn, and C. Schnぴorr, Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods, International Journal of Computer Vision 61.3 (2005), pp. 211-231.

[9] L. Fei-Fei and P. Perona, A Bayesian hierarchical model for learning natural scene categories, Computer Vision and Pattern Recognition, 2005. Computer Society Conference on IEEE, 2005, pp. 524-531.

[11] Wang, H., Klぴaser, A., Schmid, C., Liu, C.L., Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision 103.1 (2013), pp. 60-79.

國際替代計量

利用捲積神經網路進行動作辨識

未授權

主題瀏覽