透過您的圖書館登入
IP:3.137.198.96
  • 學位論文

單樣本物件檢測藉由多功能注意力機制

One-Shot Object Detection Using Versatile Attentions

指導教授 : 吳沛遠

摘要


在本文中,我們提出了ODVA用於單樣本物件檢測,其中要檢測的目標類別可以是訓練數據集中是沒見過的。 我們的ODVA使用可見類中的圖像進行訓練,而在推論階段中,ODVA會在查詢圖像中檢測與給定支持圖像匹配的對象,包含見過或沒見過的類別且無需進行任何模型微調。借助空間和通道注意力,對支持圖像中的可區分特徵進行編碼,並估算查詢圖像和支持圖像之間的相似度。 從中,基於餘量的損失函數旨在指導ODVA學習針對沒見過類別的合適度量方法。 對VOC和MS-COCO數據集的實驗評估表明,與其他最新的單樣本和元學習文獻相比,本文提出的ODVA是有效的。 此外,為了支持可解釋性,我們將RPN提案區域和注意力向量可視化,並通過消融研究證明ODVA中每個模塊的有效性。

並列摘要


In this thesis, we propose ODVA for one-shot object detection, in which the object to be detected can be unseen in the training dataset. Our ODVA is trained with images in the seen classes, while in the inference phase ODVA detects objects in the query image that match a given support image containing an unseen class without any fine-tuning. With the help of spatial and channel attentions, distinguishable features in the support image are encoded and the similarity between query and support images is estimated. From which, a margin-based loss is designed to guide ODVA into learning an appropriate metric for the unseen classes. Experimental evaluations on both VOC and MS-COCO datasets show the effectiveness of the proposed ODVA compared to other start-of-the-art one-shot and meta-learning works. In addition, to favor interpretability, we visualize the RPN proposals and attention vectors, and demonstrate the effectiveness of each module in ODVA through ablation study.

參考文獻


J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
A. Bellet, A. Habrard, and M. Sebban. A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709, 2013.
I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le. Attention augmented convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 3286–3295, 2019.
S. K. Biswas and P. Milanfar. One shot detection with laplacian object and fast matrix cosine similarity. IEEE transactions on pattern analysis and machine intelligence, 38(3):546–562, 2015.

延伸閱讀