透過共注意力與共激發來實現單樣本物件偵測

本論文提出一套透過共注意力與共激發來實現單樣本物件偵測的方法。在現實生活中，人類能夠基於少量樣本所提供的視覺資訊，達到很高的物件偵測和辨識率，但對於深度學習模型來說，只依賴少量樣本要達到可靠的物件偵測能力，卻是非常困難的挑戰。在本論文中我們探討關於單樣本的強化學習，利用共注意力與共激發的方式提升模型的學習能力。方法上，我們以 Faster R-CNN 做為模型的基本架構，對於目標影像上的每個特徵區塊利用樣本的特徵比對相似度，並強化潛在物體的特徵區塊。最後，使用樣本的特徵來選擇最有用的特徵，提高有用的特徵，捨棄無用的特徵，進而增加相似度的判斷可靠度。我們在單樣本物件偵測的成果可以達到現今最佳方法的水準，並且已經將實驗所需的程式碼開源，供後續的研究使用。

關鍵字

共注意力；共激發；單樣本物件偵測；物件偵測；透過共注意力與共激發來實現單樣本物件偵測

並列摘要

This thesis aims to tackle the challenging problem of one-shot object de-tection. Given a query image patch whose class label is not included in thetraining data, the goal of the task is to detect all instances of the same class ina target image. To this end, we develop a novelco-attention and co-excitation(CoAE) framework that makes contributions in three key technical aspects.First, we propose to use the non-local operation to explore the co-attention em-bodied in each query-target pair and yield region proposals accounting for theone-shot situation. Second, we formulate a squeeze-and-co-excitation schemethat can adaptively emphasize correlated feature channels to help uncover rel-evant proposals and eventually the target objects. Third, we design a margin-based ranking loss for implicitly learning a metric to predict the similarity ofa region proposal to the underlying query, no matter its class label is seen orunseen in training. The resulting model is therefore a two-stage detector thatyields a strong baseline on both VOC and MS-COCO under one-shot settingof detecting objects from both seen and never-seen classes