SR2-REC：基於句子重解讀和樣式正則化的適應性指述理解技術

參考表達式理解（REC）是一項視覺語言學任務，目的是識別圖像上給定的參考表達式的對象。目前最先進的REC模型將參考目標作為一個乾淨的來源，因此它們沒有考慮到表達對目標對象的描述不佳的情況。此外，表達式可以用不同的交流方式表達類似的想法，因此REC模型應該有辦法適應不同的交流方式，以實現正確的檢測。在本文中，我們提出了SR2-REC轉化器，它將引用表達式作為輸入，然後根據目標風格（句子風格規範化）輸出多種解釋（句子重新解釋），這些解釋可以被輸入任何REC模型進行目標識別。對於句子風格正則化，我們使用一個場景圖解析器來識別一個統一的目標風格，我們使用beamsearch解碼算法來生成多個句子。我們將我們的SR2-REC網絡與最先進的REC模型相結合，包括ViLBert、VL-Bert和MCN。在RefCOCO、RefCOCO+和RefCOCOg的測試中，目標識別的準確性顯示了所提出的句子處理方法即使在領域轉移任務中也是有效的

關鍵字

參照表達理解； Transformers ； Beamsearch ；條件語言生成；語言風格適應

並列摘要

The Referring Expression Comprehension (REC) is a visual linguistic task that aims to identify an object on an image given a referring expression. Current stateoftheart REC models treat the referring expression as a clean source, as a result they fail to consider cases where the expression gives a poor description of the target object. Moreover, expressions can express similar ideas in different communication styles, consequently REC models should have a way to adapt to different communication styles in order to attain the correct detection. In this paper, we propose SR2REC transformer which takes a referring expression as input then outputs multiple interpretations (sentence reinterpretation) biased on a target style (sentence style regularization) which can be fed to any REC model for target identification. For sentence style regularization, we use a scene graph parser to identify a unified target style and we use the beamsearch decoding algorithm generate multiple sentences. We have integrated our SR2REC network with stateoftheart REC models, including ViLBert, VLBert, and MCN. The target identification accuracy, tested in the RefCOCO, RefCOCO+, and RefCOCOg, shows the proposed sentence processing method’s effectiveness even in domain transfer tasks.

並列關鍵字

Referring Expression Comprehension ； Transformers ； Beamsearch ； conditional language generation ； language style adaptation

參考文獻

References

Google Scholar

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, u. Kaiser, and

Google Scholar

I. Polosukhin, “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, (Red Hook, NY, USA),

Google Scholar

p. 6000–6010, Curran Associates Inc., 2017.

Google Scholar

[2] L. Yu, Z. Lin, X. Shen, J. Yang, X. Lu, M. Bansal, and T. L. Berg, “Mattnet: Modular

Google Scholar

國際替代計量

SR2-REC：基於句子重解讀和樣式正則化的適應性指述理解技術

全文下載

主題瀏覽