透過您的圖書館登入
IP:3.137.221.163
  • 學位論文

利用深度資訊與空間時間矩陣線上無關視點動作辨識

Online View-invariant Human Action Recognition Using RGB-D Spatio-temporal Matrix

指導教授 : 傅立成

摘要


近年來,動作辨識是影像視覺領域熱門的研究主題。 為了使系統能夠以最貼近人類 ,最自然的方式來解讀精細且複雜的動作,我們採取視覺為基礎來設計系統; 人類在辨識他人的肢體動作時,不一定要從表演者的正前方,只要能夠獲取足夠 的視覺資訊,可以從各個視點去辨識。因此,在本篇論文中,我們的目標為 建造出一個以視覺為基礎的動作辨識系統,此系統可以不受視點的影響,在 獲得足夠的肢體資訊下皆可有效的分辨人類的動作。 為了達到此目的,我們引用了自身相似(Self-Similarity)的概念。不同的視點 即使做相同的動作,因為所看到的實際畫面不同,會萃取出不同的特徵,因此 不同以往的直接使用萃取之特徵建立模型,我們計算所有幀與幀之間的特徵距離 存取在一矩陣中稱之為自身相似矩陣(Self-Similarity Matrix),我們進一步將 此矩陣切割成多個子矩陣。接著利用我們提出的時間金字塔詞袋 (Temporal-Pyramid Bag-of-Words)來表示各個子矩陣,並利用所有子矩陣的 金字塔詞袋來表示一個動作。我們將時間金字塔詞袋做為輸入向量訓練出一支持 向量機藉此達到無關視角動作辨識之目的。

並列摘要


Understanding human action has drawn attention to the field of computer vision. We choose vision-based system so that computer system can understand human actions naturally. When people are recognizing actions of other people, the actors do not have to stand right in front of the observer. Therefore, in this thesis, we aim to build a vision-based action recognition system which is invariant to the viewpoint. To achieve this goal, we include the idea of self-similarity. When two video sequences record a specific action from various camera views, the resulting appearances of actions would be entirely different. Consequently, if we simply apply feature extraction methods to the raw video, we will end up getting totally different features. Instead of doing the extraction of spatio-temporal feature for every frame and using these feature vectors directly, our study uses the Euclidean distance between feature vectors that are represented in a Self-Similarity Matrix (SSM). To recognize the action, we describe the local tendency of the SSM using pyramid-structural bag-of-words and train a Support-Vector Machine as our classifier. Extensive experiments have been conducted to validate the proposed action recognition system.

參考文獻


[1] P. Viola and M. J. Jones, “Robust real-time face detection,” International journal of
computer vision, vol. 57, no. 2, pp. 137–154, 2004.
[2] C. Huang, H. Ai, Y. Li, and S. Lao, “High-performance rotation invariant multiview
histogram sequence (lgbphs): A novel non-statistical model for face representation
and recognition,” in IEEE International Conference on Computer Vision (ICCV),

延伸閱讀