以使用局部軟分配之改良式局部聚合向量作人體動作辨識

近年來，費雪向量 (FV) 與局部聚合向量 (VLAD) 已被廣泛地運用在人體動作辨識上，並且得到了優異的表現。本研究是建立於VLAD 之上，並運用局部軟分配以及二階統計資訊來做增強, 提出更有效的表示法。首先，有別於傳統VLAD的硬分配，在對影片進行VLAD編碼時只考量距離最接近的visual word，我們採用了局部軟分配來考量多個相近的visual words。然而以局部軟分配所形成的VLAD (LSA-VLAD) 只保留了descriptors以及visual words之一階統計資訊，因此接著我們運用了二階統計資訊, 以局部軟分配將其編成VLAD的型態，命名為LSA2-VLAD，用以補充LSA-VLAD。根據在HMDB51數據集進行分類的實驗結果表示，本研究結合LSA-VLAD和LSA2-VLAD的表示法比其他比較方法所使用之表示法擁有更好的辨識率。

關鍵字

局部聚合向量；高維表示法；局部軟分配；人體動作辨識；特徵編碼；視覺詞袋模型

並列摘要

Recently, high dimensional representations, e.g., Fisher vector (FV) and vector of locally aggregated descriptors (VLAD), are widely utilized in action recognition and have shown state-of-the-art performance. The proposed approach provides an effective representation which is built upon VLAD and boosted with localized soft assignment (LSA) and second order statistics. Firstly, we utilize localized soft assignment, i.e., considering multiple nearest visual words when encoding videos into VLAD while the traditional VLAD’s hard assignment only considering the nearest one. However, the LSA version VLAD (LSA-VLAD) only keeps first order statistics of descriptors and visual words. Thus, secondly, we both utilize LSA and second order statistics, which are encoded into VLAD-like form and named as LSA2-VLAD. Based on the experimental results evaluated on HMDB51, the combination of LSA-VLAD and LSA2-VLAD get higher recognition rate than those representation methods of comparison approaches.

並列關鍵字

feature encoding ； bag-of-visual-words ； human action recognition ； localized soft assignment ； vector of locally aggregated descriptors (VLAD) ； action representation ； high dimensional representation ； HMDB51 human motion dataset

參考文獻

[7] Y. Tian, R. Sukthankar, and M. Shah, “Spatiotemporal deformable part models for action detection,” in Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition, 2013, pp. 2642–2649.

[1] I. Laptev and T. Lindeberg, “Space-time interesting points,” in Proc. of IEEE Int. Conf. on Computer Vision, 2003, pp. 432–439.

[2] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” in Proc. of Joint IEEE Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72.

[4] B. Chakraborty, M. Holte, T. B. Moeslund, J. Gonzàlez, and F. X. Roca, “A selective spatio-temporal interest point detector for human action recognition in complex scenes,” in Proc. of IEEE Int. Conf. on Computer Vision, 2011, pp. 1776–1783.

[5] B. Chakraborty, M. B. Holte, T. B. Moeslund, J. Gonzàlez, “Selective spatio-temporal interest points,” Computer Visionan and Image Understanding, vol. 116, no. 3, pp. 396–410, Oct. 2012.

國際替代計量

以使用局部軟分配之改良式局部聚合向量作人體動作辨識

未授權

主題瀏覽