基於焦點移轉的方式理解指稱表達式中的物件關係

物件偵測這個問題在電腦視覺領域裏已經被研究很久了,也有許多重大的成果。近年來人們開始把重心放在理解影像中的物件關係上,這樣子的問題叫做物件關係偵測,在這幾年大家的努力下已經有一些成果。SSAS [2] 提出了我們應該要把物件關係偵測這個問題倒過來做,也就是給定物件關係再去偵測物件,原因是這樣的問題更符合人類直覺,他們把這個問題稱作理解指稱表達式中的物件關係。我們認為 SSAS [2] 提出的想法確實合理,並且能夠輕易地應用在人類和機器人的互動上。在這篇研究中我們探討了 SSAS [2] 這篇論文中的缺陷,並且提出改進方法,我們還根據自己的觀察,設計出一個全新的網路元件。在我們儘可能公平的比較下, 我們是在這個問題上表現最好的方法,也確實解決了 SSAS [2] 上的諸多問題。

關鍵字

指稱關係；物件偵測；深度學習

並列摘要

Object detection is an already well researched topic in the area of com- puter vision. Recently, people tried to pay their attention to the visual relationships between objects, the task is called visual relationship detection, there were plenty works that made progress in these years. SSAS [2] claimed that we should focus on the inverse problem of visual relationship detection, which is to detect the object from a given relationship. They thought the inverse problem is more conform to human intuition, and called this referring relationship. We thought that their statements were reasonable, and it can be applied to human robot interaction scenario easily. In this thesis, we explored the drawback of SSAS [2], and proposed our solution, we also designed a new network component with our observation. With a comparison as fair as we can make, we achieved state-of-the-art performance in the task, and indeed resolved the problems of SSAS [2].

並列關鍵字

Referring relationship ； Object detection ； Deep learning

參考文獻

[1] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.

Google Scholar

[2] R. Krishna, I. Chami, M. Bernstein, and L. Fei-Fei. Referring relationships. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Google Scholar

[3] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, M. Bernstein, and L. Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. 2016.

Google Scholar

[4] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.

Google Scholar

[5] C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei. Visual relationship detection with language priors. In European Conference on Computer Vision, 2016.

Google Scholar

國際替代計量

基於焦點移轉的方式理解指稱表達式中的物件關係

未授權

主題瀏覽