透過您的圖書館登入
IP:3.145.201.156
  • 學位論文

以使用局部軟分配之改良式局部聚合向量作人體動作辨識

Human Action Recognition using Improved Vector of Locally Aggregated Descriptors with Localized Soft Assignment

指導教授 : 柳金章
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,費雪向量 (FV) 與局部聚合向量 (VLAD) 已被廣泛地運用在人體動作辨識上,並且得到了優異的表現。本研究是建立於VLAD 之上,並運用局部軟分配以及二階統計資訊來做增強, 提出更有效的表示法。首先,有別於傳統VLAD的硬分配,在對影片進行VLAD編碼時只考量距離最接近的visual word,我們採用了局部軟分配來考量多個相近的visual words。然而以局部軟分配所形成的VLAD (LSA-VLAD) 只保留了descriptors以及visual words之一階統計資訊,因此接著我們運用了二階統計資訊, 以局部軟分配將其編成VLAD的型態,命名為LSA2-VLAD,用以補充LSA-VLAD。根據在HMDB51數據集進行分類的實驗結果表示,本研究結合LSA-VLAD和LSA2-VLAD的表示法比其他比較方法所使用之表示法擁有更好的辨識率。

並列摘要


Recently, high dimensional representations, e.g., Fisher vector (FV) and vector of locally aggregated descriptors (VLAD), are widely utilized in action recognition and have shown state-of-the-art performance. The proposed approach provides an effective representation which is built upon VLAD and boosted with localized soft assignment (LSA) and second order statistics. Firstly, we utilize localized soft assignment, i.e., considering multiple nearest visual words when encoding videos into VLAD while the traditional VLAD’s hard assignment only considering the nearest one. However, the LSA version VLAD (LSA-VLAD) only keeps first order statistics of descriptors and visual words. Thus, secondly, we both utilize LSA and second order statistics, which are encoded into VLAD-like form and named as LSA2-VLAD. Based on the experimental results evaluated on HMDB51, the combination of LSA-VLAD and LSA2-VLAD get higher recognition rate than those representation methods of comparison approaches.

參考文獻


[7] Y. Tian, R. Sukthankar, and M. Shah, “Spatiotemporal deformable part models for action detection,” in Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition, 2013, pp. 2642–2649.
[1] I. Laptev and T. Lindeberg, “Space-time interesting points,” in Proc. of IEEE Int. Conf. on Computer Vision, 2003, pp. 432–439.
[2] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” in Proc. of Joint IEEE Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72.
[4] B. Chakraborty, M. Holte, T. B. Moeslund, J. Gonzàlez, and F. X. Roca, “A selective spatio-temporal interest point detector for human action recognition in complex scenes,” in Proc. of IEEE Int. Conf. on Computer Vision, 2011, pp. 1776–1783.
[5] B. Chakraborty, M. B. Holte, T. B. Moeslund, J. Gonzàlez, “Selective spatio-temporal interest points,” Computer Visionan and Image Understanding, vol. 116, no. 3, pp. 396–410, Oct. 2012.

延伸閱讀