基於深度共現特徵的影像辨識

這篇論文解決了三項過去結合基於物體部件的表示方法及卷積類神經網路應用於影像辨識上的作法之問題。首先，大多基於物體部件的模型需要人工事先定義部件的個數與種類，然而，最適合用於影像辨識的物體部件時常會隨著要區分的資料而改變。此外，多數方法在訓練卷積類神經網路時需要使用包含部件位置資訊之訓練資料，人工成本相當昂貴。最後，過去方法為了表達部件間的位置關係，經常需要繁瑣的計算或是使用多支龐大的神經網路。我們提出一種全新的共現特徵層來解決上述三項問題。共現特徵層延伸卷積層概念，利用網路中神經元自動學習以取代事先定義的物體部件，並記錄部件間共同出現的關係。在共現特徵層中，卷積層所產生的任兩張特徵圖像作為濾器及影像，以濾器對於影像進行相關濾波運算。網路路連接共現特徵層後仍可以由頭至尾訓練，且共現層產生的共現特徵能抵抗旋轉與位移，以及物體形變的影響。我們在VGG-16及ResNet-152加上共現特徵層，將Caltech-UCSD 鳥類影像集的辨識正確率提升至83.6%及85.8%。此篇論文的原始碼發佈於https://github.com/yafangshih/Deep-COOC。

關鍵字

影像辨識；細粒度影像辨識；共現特徵

並列摘要

This thesis addresses three issues in integrating part-based representations into convolutional neural networks (CNNs) for object recognition. First, most part-based models rely on a few pre-specified object parts. However, the optimal object parts for recognition often vary from category to category. Second, acquiring training data with part-level annotation is laborintensive. Third, modeling spatial relationships between parts in CNNs often involves an exhaustive search of part templates over multiple network streams. We tackle the three issues by introducing a new network layer, called co-occurrence layer. It can extend a convolutional layer to encode the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts. To this end, the feature maps serve as both filters and images, and mutual correlation filtering is conducted between them. The co-occurrence layer is end-to-end trainable. The resultant co-occurrence features are rotation- and translation-invariant, and are robust to object deformation. By applying this new layer to the VGG-16 and ResNet-152, we achieve the recognition rates of 83.6% and 85.8% on the Caltech-UCSD bird benchmark, respectively. The source code is available at https://github.com/yafangshih/Deep-COOC.

並列關鍵字

Object recognition ； fine-grained recognition ； co-occurrence feature

參考文獻

on Fine-Grained Visual Categorization (FGVC), 2011.

[4] Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. Bilinear CNN models for fine-grained visual recognition. In Proc. Int’l Conf. Computer Vision, 2015.

[5] Mettu Srinivas, Yen-Yu Lin, and Hong-Yuan Mark Liao. Learning deep and sparse feature representation for fine-grained recognition. In Proc. Int’l Conf. Multimedia

and abstraction for fine-grained recognition. In Proc. Conf. Computer Vision and Pattern Recognition, 2016.

[7] Thomas Berg and Peter N Belhumeur. POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In Proc. Conf. Computer Vision and Pattern Recognition, 2013.

國際替代計量

基於深度共現特徵的影像辨識

全文下載

主題瀏覽