透過您的圖書館登入
IP:18.224.66.196
  • 學位論文

基於焦點移轉的方式理解指稱表達式中的物件關係

Referring Relationships Comprehension by Residual Attention Shift

指導教授 : 徐宏民
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


物件偵測這個問題在電腦視覺領域裏已經被研究很久了,也有許多重 大的成果。近年來人們開始把重心放在理解影像中的物件關係上,這樣子 的問題叫做物件關係偵測,在這幾年大家的努力下已經有一些成果。SSAS [2] 提出了我們應該要把物件關係偵測這個問題倒過來做,也 就是給定物件關係再去偵測物件,原因是這樣的問題更符合人類直覺,他們 把這個問題稱作理解指稱表達式中的物件關係。我們認為 SSAS [2] 提出的 想法確實合理,並且能夠輕易地應用在人類和機器人的互動上。在這篇研究 中我們探討了 SSAS [2] 這篇論文中的缺陷,並且提出改進方法,我們還根 據自己的觀察,設計出一個全新的網路元件。在我們儘可能公平的比較下, 我們是在這個問題上表現最好的方法,也確實解決了 SSAS [2] 上的諸多問 題。

並列摘要


Object detection is an already well researched topic in the area of com- puter vision. Recently, people tried to pay their attention to the visual relationships between objects, the task is called visual relationship detection, there were plenty works that made progress in these years. SSAS [2] claimed that we should focus on the inverse problem of visual relationship detection, which is to detect the object from a given relationship. They thought the inverse problem is more conform to human intuition, and called this referring relationship. We thought that their statements were reasonable, and it can be applied to human robot interaction scenario easily. In this thesis, we explored the drawback of SSAS [2], and proposed our solution, we also designed a new network component with our observation. With a comparison as fair as we can make, we achieved state-of-the-art performance in the task, and indeed resolved the problems of SSAS [2].

參考文獻


[1] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[2] R. Krishna, I. Chami, M. Bernstein, and L. Fei-Fei. Referring relationships. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[3] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, M. Bernstein, and L. Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. 2016.
[4] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
[5] C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei. Visual relationship detection with language priors. In European Conference on Computer Vision, 2016.

延伸閱讀