透過您的圖書館登入
IP:3.143.9.115
  • 學位論文

強健於遮蔽之一種有效率組件式動作辨識

Action Recognition Robust to Occlusion Using an Efficient Part-Based Approach

指導教授 : 傅立成

摘要


近年來,動作辨識已成為備受歡迎的研究領域。為了讓系統以自然的方式來解讀人類的動作,以視覺為基礎的系統較獲得青睞;然而,在真實世界中,遮蔽是不可避免的。因此,在本篇論文中,我們的目標為建造出一個以視覺為基礎的動作辨識系統,並且能在遮蔽的情況下辨識出人類的動作。 為了達到這目的,我們提出了一種有效率的組件式方法。不同於使用組件濾波器來偵測各個組件,我們利用骨架資訊來將局部時空特徵有效率地劃分到不同的組件。接著我們利用提出的時間金字塔詞袋(Temporal-Pyramid Bag-of-Words)來表示各個組件,並且利用所有組件的時間金字塔詞袋來表示一個動作。對於每個組件而言,我們會將時間金字塔詞袋做為輸入向量去訓練出一個局部支持向量機,而所有的局部支持向量機加上一個全域的支持向量機則會用來切割和辨識動作。我們只需要沒有遮蔽的資料來進行訓練即可,然而動作卻可以在有遮蔽的情形下被切割以及辨識出來。 我們的系統可以線上切割與辨識動作,而且強健於自我遮蔽、部分遮蔽、以及暫時性完全遮蔽,其中遮蔽能以動態的方式出現。我們透過大量的實驗來驗證系統的效能,並且顯示了傑出的成果。

並列摘要


Action recognition has become an active researching field in recent years. To make system understand human actions in natural way, vision-based system is preferable; however, occlusion is inevitable in real world. Therefore, in this thesis, we aim to build a vision-based action recognition system that can recognize human actions under occlusion. To achieve this goal, we propose an efficient part-based approach. Instead of using a set of part filters to detect each part, we exploit skeleton information and efficiently divide local spatio-temporal features into parts. Then each part is represented by the proposed Temporal-Pyramid Bag-of-Words (Temporal-Pyramid BoW), and an action is represented by all Temporal-Pyramid BoWs from all parts. For each part, we train a local Support Vector Machine (SVM) using the Temporal-Pyramid BoW as input vector. All local SVMs together with a global SVM are applied to spot and recognize human actions. Only non-occlusion data is required for training while actions can be spotted and recognized under occlusion. Our system can spot and recognize human actions on-line, and it is robust to self-occlusion, partial occlusion, and temporary complete occlusion, where occlusion can occur in dynamic way. Extensive experiments are used to validate the performance of our system, and the results appear to be quite promising.

參考文獻


[1] C. Ellis, S. Masood, M. Tappen, J. LaViola, Jr., and R. Sukthankar, "Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition," International Journal of Computer Vision, vol. 101, pp. 420-436, 2013.
[2] J. Aggarwal and M. S. Ryoo, "Human activity analysis: A review," ACM Computing Surveys (CSUR), vol. 43, p. 16, 2011.
[3] A. F. Bobick and J. W. Davis, "The recognition of human movement using temporal templates," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, pp. 257-267, 2001.
[4] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, "Actions as space-time shapes," IEEE International Conference on Computer Vision, pp. 1395-1402, 2005.
[5] A. Yilmaz and M. Shah, "Actions sketch: A novel action representation," IEEE Conference on Computer Vision and Pattern Recognition, pp. 984-989, 2005.

延伸閱讀