利用深度資訊與空間時間矩陣線上無關視點動作辨識

近年來，動作辨識是影像視覺領域熱門的研究主題。為了使系統能夠以最貼近人類，最自然的方式來解讀精細且複雜的動作，我們採取視覺為基礎來設計系統；人類在辨識他人的肢體動作時，不一定要從表演者的正前方，只要能夠獲取足夠的視覺資訊，可以從各個視點去辨識。因此，在本篇論文中，我們的目標為建造出一個以視覺為基礎的動作辨識系統，此系統可以不受視點的影響，在獲得足夠的肢體資訊下皆可有效的分辨人類的動作。為了達到此目的，我們引用了自身相似(Self-Similarity)的概念。不同的視點即使做相同的動作，因為所看到的實際畫面不同，會萃取出不同的特徵，因此不同以往的直接使用萃取之特徵建立模型，我們計算所有幀與幀之間的特徵距離存取在一矩陣中稱之為自身相似矩陣(Self-Similarity Matrix)，我們進一步將此矩陣切割成多個子矩陣。接著利用我們提出的時間金字塔詞袋 (Temporal-Pyramid Bag-of-Words)來表示各個子矩陣，並利用所有子矩陣的金字塔詞袋來表示一個動作。我們將時間金字塔詞袋做為輸入向量訓練出一支持向量機藉此達到無關視角動作辨識之目的。

關鍵字

動作辨識；無關視點；自身相似

並列摘要

Understanding human action has drawn attention to the field of computer vision. We choose vision-based system so that computer system can understand human actions naturally. When people are recognizing actions of other people, the actors do not have to stand right in front of the observer. Therefore, in this thesis, we aim to build a vision-based action recognition system which is invariant to the viewpoint. To achieve this goal, we include the idea of self-similarity. When two video sequences record a specific action from various camera views, the resulting appearances of actions would be entirely different. Consequently, if we simply apply feature extraction methods to the raw video, we will end up getting totally different features. Instead of doing the extraction of spatio-temporal feature for every frame and using these feature vectors directly, our study uses the Euclidean distance between feature vectors that are represented in a Self-Similarity Matrix (SSM). To recognize the action, we describe the local tendency of the SSM using pyramid-structural bag-of-words and train a Support-Vector Machine as our classifier. Extensive experiments have been conducted to validate the proposed action recognition system.

並列關鍵字

Action Recognition ； View-Invariant ； Self-Similarity

參考文獻

[1] P. Viola and M. J. Jones, “Robust real-time face detection,” International journal of

computer vision, vol. 57, no. 2, pp. 137–154, 2004.

[2] C. Huang, H. Ai, Y. Li, and S. Lao, “High-performance rotation invariant multiview

histogram sequence (lgbphs): A novel non-statistical model for face representation

and recognition,” in IEEE International Conference on Computer Vision (ICCV),

國際替代計量

利用深度資訊與空間時間矩陣線上無關視點動作辨識

全文下載

主題瀏覽