透過您的圖書館登入
IP:18.119.132.223
  • 學位論文

注意力與集叢的訓練挖掘框架於弱監督物件偵測

Attention and Ensemble Training-mining Process on Weakly Supervised Object Detection

指導教授 : 周俊廷

摘要


大量邊界框的標記成本是訓練現代物件偵測器的主要挑戰之一,為了減少對昂貴的邊界框標註的依賴,研究人員致力於僅使用圖像級別的標籤或使用少量的邊界框標註和大量的圖像及標籤學習物件偵測器。這方法稱為弱監督式物件偵測。 僅使用圖像級標籤訓練的物體偵測器有許多雜訊,例如假陰性,假陽性和不准確的邊界框。NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection (NOTE-RCNN)被提出用於處理這種雜訊標籤。然而,NOTE-RCNN在訓練挖掘過程中仍然存在嚴重的假陰性問題。在本論文中,我們沿用NOTE-RCNN的架構並提出一種基於注意力的挖掘方法,該方法使用Averaging Blur模糊所有挖掘邊界框以外的區域,以消除下一次迭代的訓練挖掘過程中潛在的假陰性。我們將基於注意力的方法與NOTE-RCNN進行比較。在PASCAL VOC 2007上的實驗證明:假陰性對圖像級別數據的比率下降了8.4%,測試集上的mAP增加了3.3%。 此外,我們提出迭代集叢的方法,從多個物體偵測器的結果中選擇更好的邊界框以提高表現。在迭代集叢中有三種選擇邊界框的方式,一致通過、多數決通過和無條件通過。我們證明了迭代集叢在理論上是可行的,一致通過可以用於希望較少假陽性的應用情境,並且無條件通過可以用於希望較多真陽性的應用程序。 在未來的研究方向上,我們可以使用一些更進階的集叢決策來進一步提高探測器的性能。

並列摘要


The object detectors trained only with image-level labels have a lot of noises, such as false negatives, false positives, and inaccurate boundaries. NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection (NOTE-RCNN) is proposed to handle such noisy labels. However, NOTE-RCNN still has a serious false negative problem in the training-mining process. In this thesis, we follow NOTE-RCNN architecture and propose an attention-based method for mining process, which use Averaging Blur to blur outside all the mined boxes to eliminate potential false negatives in the training-mining process of the next iteration. We compare our attention-based method with NOTE-RCNN. Our experiments on PASCAL VOC 2007 demonstrate: The ratio of the false negative on the image-level data decreased by 8.4% and the mAP on test set increased by 3.3%. In addition, we propose Ensemble Iteration to choose the better bounding boxes from multiple object detectors for improving the performance. There are three ensemble decisions in Ensemble Iteration, Unanimous, Simple Majority, and If-you-say-so. We show that Ensemble Iteration is theoretically feasible, Unanimous can be used for the applications which care about fewer false positives, and If-you-say-so can be used for the applications which care about the more true positive. In the future research direction, we can use some advanced ensemble decision for further improving the performance of the detector.

參考文獻


[1] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” 2014
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[3] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[4] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, 2015.
[5] Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 1653–1660, IEEE, 2014

延伸閱讀