用於非監督式特徵學習的快速稀疏編碼

影像辨識是機器視覺中基本的研究主題。在整個辨識系統中，影像表示法是關鍵的部分，也因此愈來愈多研究著重於此。Bag of visual words模型與其延伸方法已經被廣泛使用於各種影像表示系統。其中主要有三個步驟：首先對密集取樣的影像區塊取得低階影像表示；第二步驟，低階表示經過編碼過程得到高階表示；最後，整張影像切成不同的小區域，各自小區域中使用pooling得到對應的區域表示法，串接每個區域表示法得到最終影像表示。在本篇論文我們提出了三種加速影像編碼的方法：第一種為簡化傳統正交匹配演算法；第二種是使用新的pooling方法，用以加速字的選擇；第三種為使用locality-sensitive hashing演算法快速搜尋出預先儲存的影像編碼。我們提出的方法可以加速傳統的影像編碼過程同時又能維持相當的影像辨識率。

關鍵字

非監督式特徵學習；稀疏編碼

參考文獻

[1] M. Aharon, M. Elad, and A. Bruckstein. K -svd: An algorithm for designing overcomplete dictionaries for sparse representation. Signal Processing, IEEE Transactions on, 54(11):4311–4322, 2006.

[2] L. Bo, X. Ren, and D. Fox. Unsupervised feature learning for rgb-d based object recognition. In ISER, June 2012.

[3] L. Bo, X. Ren, and D. Fox. Multipath sparse coding using hierarchical matching pursuit. In CVPR, June 2013.

[5] Li Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In CVPRW, pages 178–178, 2004.

[7] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169–2178, 2006.

國際替代計量

用於非監督式特徵學習的快速稀疏編碼

全文下載

主題瀏覽