近年來,費雪向量 (FV) 與局部聚合向量 (VLAD) 已被廣泛地運用在人體動作辨識上,並且得到了優異的表現。本研究是建立於VLAD 之上,並運用局部軟分配以及二階統計資訊來做增強, 提出更有效的表示法。首先,有別於傳統VLAD的硬分配,在對影片進行VLAD編碼時只考量距離最接近的visual word,我們採用了局部軟分配來考量多個相近的visual words。然而以局部軟分配所形成的VLAD (LSA-VLAD) 只保留了descriptors以及visual words之一階統計資訊,因此接著我們運用了二階統計資訊, 以局部軟分配將其編成VLAD的型態,命名為LSA2-VLAD,用以補充LSA-VLAD。根據在HMDB51數據集進行分類的實驗結果表示,本研究結合LSA-VLAD和LSA2-VLAD的表示法比其他比較方法所使用之表示法擁有更好的辨識率。
Recently, high dimensional representations, e.g., Fisher vector (FV) and vector of locally aggregated descriptors (VLAD), are widely utilized in action recognition and have shown state-of-the-art performance. The proposed approach provides an effective representation which is built upon VLAD and boosted with localized soft assignment (LSA) and second order statistics. Firstly, we utilize localized soft assignment, i.e., considering multiple nearest visual words when encoding videos into VLAD while the traditional VLAD’s hard assignment only considering the nearest one. However, the LSA version VLAD (LSA-VLAD) only keeps first order statistics of descriptors and visual words. Thus, secondly, we both utilize LSA and second order statistics, which are encoded into VLAD-like form and named as LSA2-VLAD. Based on the experimental results evaluated on HMDB51, the combination of LSA-VLAD and LSA2-VLAD get higher recognition rate than those representation methods of comparison approaches.