弱監督式卷積神經網路學習於視覺顯著物體發掘

在影像處理跟電腦視覺領域中，視覺顯著物體發掘是一個重要的且待解決的難題。這一個議題的主要目標是產生機率圖，用來標示影像中所有吸引人的區域。因為這個任務所產生的結果可以用來指出物體位置跟過濾不相關的背景，因此這個任務對於其他應用，如圖像重定位，視覺追蹤，物體切割跟辨認，是非常重要的。由於卷積神經網路可以有效果地學習影像特徵跟非線性分類器，因此目前的最佳方法都是建構在卷積神經網路上。不過，最大的缺點在於需要大量人工標記的像素等級的訓練資料訓練卷積神經網路。不過由於蒐集這類的資料需要花費量的人力資源，因此會限縮此類任務在其他應用的可能性。這個博士論文中，我們提出四個方法處理上述所遇到的問題。第一個方法中，我們把類別特定的資訊加入視覺顯著物體發掘任務中，並且提出一個弱監督學習的方法來減少所需的標記成本。所提出的方法包含兩個以卷積神經網路為基的模組，影像等級的分類器跟像素等級的產生器，外加上四個損失函數。結果證明所提出的方法超越目前的弱監督跟強監督方法。第二個方法中，我們提出一個非監督式端對端訓練的方法叫做共注視卷積神經網路用於物體協同切割，且共注視卷積神經網路可以有效地獲得不同影像間物體一致性資訊，因此所提出的方法可以有效地超越目前的監督跟非監督式最佳方法。第三個方法中，我們提出一個新的卷積神經網路可以一起維持多張影像間物體間的一致性跟單張影像內物體的顯著性於物體共顯著偵測。所提出的方法超越最佳非監督式方法，也跟最佳監督式方法達到相同的效果。最後一個方法中，我們提出一個新的但困難的任務叫做實例等級物體協同切割，並針對這個問題，提出一個共尖峰的概念用於定位且切割不同影像間的物體實例。我們利用共尖峰的概念發展一個簡單且有效的卷積神經網路為基架構用於這一個新的任務。我們針對這個新任務蒐集四個資料集，且在這四個資料集上，我們所提出的架構達到最佳效果。

關鍵字

顯著性物體偵測；物體共顯著偵測；物體協同切割；卷積神經網路；非監督式學習；弱監督式學習

並列摘要

Visual attention-getting object discovery has been an active topic in the fields of image processing and computer vision for decades. In this topic, the goal is to produce the saliency maps which highlights the regions of objects attracting people. This task is crucial to various applications such as image retargeting, visual tracking, object segmentation, and object recognition because the produced results can indicate objects of interest and mask out the irrelevant background. The current state-of-the-art saliency detection methods adopt convolutional neural networks (CNNs) because it has demonstrated effectiveness in joint visual feature extraction and nonlinear classifier learning. However, they require additional training data in the form of pixel-wise annotations, often manually drawn or delineated by tools with intensive user interaction. Such heavy annotation cost makes these methods less practical as pointed out in other applications. In this dissertation, we address the aforementioned issue by proposing four methods for visual attention-getting object discovery. In the first work, we integrate the class-specific information into the visual attention-getting object discovery and then propose a weakly supervised learning method to reduce the annotation cost. The proposed method is composed of two CNN-based modules, image-level classifier and a pixel-level map generator with four losses. The results show that our approach outperforms the state-of-the-art weakly supervised methods and many fully supervised ones in both accuracy and efficiency. In the second work, we address unsupervised CNN-based object co-segmentation under an end-to-end trainable scheme and thus propose a co-attention CNNs to explore the inter-object consistency. The proposed method remarkably outperforms the state-of-the-art unsupervised and supervised methods on the standard object co-segmentation benchmarks. In the third work, we focus on unsupervised CNN-based object co-saliency detection and propose an end-to-end trainable graphical CNNs to jointly preserve the inter-object consistency and explore the intra-object saliency. The results show that our approach remarkably outperforms the state-of-the-art unsupervised methods and even surpasses many supervised DL-based saliency detection methods. In the final work, we tackle the CNN-based instance co-segmentation, which is a new and challenging task, and propose the concept, co-peak, to localize and segment each object instance in the given images. We develop a simple and effective method is developed for instance co-segmentation. The proposed method learns a model based on the {em fully convolutional network} (FCN) by optimizing three proposed losses. The learned model can reliably detect co-peaks and co-saliency maps for instance mask segmentation. Four datasets are collected for evaluating instance co-segmentation, and we achieve the state-of-the-art performance on these four datasets.

並列關鍵字

saliency object detection ； object co-saliency detection ； object co-segmentation ； convolutional neural networks ； unsupervised learning ； weakly supervised learning

參考文獻

[1] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurélien Lucchi, Pascal Fua, and Sabine Süsstrunk. SLIC superpixels compared to state-of-the-art superpixel methods.IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2274–2282, 2012.

Google Scholar

[2] David Aldavert, Arnau Ramisa, Ramón López de Mántaras, and Ricardo Toledo. Fast and robust object segmentation with the integral linear classifier. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1046–1053, 2010.

Google Scholar

[3] Min Bai and Raquel Urtasun. Deep watershed transform for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2858–2866, 2017.

Google Scholar

[4] Dhruv Batra, Adarsh Kowdle, Devi Parikh, Jiebo Luo, and Tsuhan Chen. iCoseg: Interactive co-segmentation with intelligent scribble guidance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3169–3176, 2010.

Google Scholar

[5] Archith John Bency, Heesung Kwon, Hyungtae Lee, S. Karthikeyan, and B. S. Manjunath. Weakly supervised localization using deep feature maps. In Proceedings of the European Conference on Computer Vision, pages 714–731, 2016.

Google Scholar

國際替代計量

弱監督式卷積神經網路學習於視覺顯著物體發掘

查找全文

主題瀏覽