屬性增強於零樣本物件偵測

在本研究中，我們探討了零樣本物件偵測的問題，這項任務涉及預測目標影像中物件的位置和標籤，無論這些標籤屬於seen或unseen的類別。許多基於生成方法的零樣本物件偵測方法利用類別的語義屬性，結合高斯雜訊來生成視覺特徵。這種方法通過生成unseen類別的樣本，將零樣本物件偵測問題轉變為一個近似監督式物件偵測問題。然而，當前的生成模型是基於單一的、完整的語義屬性資料，包含一個類別的所有屬性資訊。以這種方式生成的視覺特徵並不足夠覆蓋真實的視覺特徵，真實的視覺特徵很可能缺乏某些屬性資訊。僅僅使用完整的視覺特徵，對於有效訓練分類器以對真實視覺特徵進行分類是不夠的。鑒於此，我們提出了一種方法，目的在於通過擴展語義屬性來生成模擬真實視覺特徵分布的多樣化特徵，以提升物件偵測模型中分類器的性能。我們在兩個常見的物件偵測資料集上測試了我們的方法：MS COCO 和 PASCAL VOC。

關鍵字

零樣本物件偵測；生成對抗學習；視覺語義關係；屬性增強

並列摘要

In this study, we investigate the problem of zero-shot object detection, a task that involves predicting the locations and labels of objects in target images, regardless of whether these labels belong to seen or unseen categories. Many zero-shot object detection approaches based on generative methods utilize the semantic attributes of categories combined with Gaussian noise to generate visual features. This approach transforms the zero-shot object detection problem into an approximated supervised object detection problem by generating samples of unseen categories. However, current generative models are based on a singular, complete set of semantic attributes, encompassing all attribute information of a category. The visual features generated in this manner do not resemble real visual features, which may lack certain attribute information. Merely complete visual features are insufficient for effectively training classifiers to categorize real visual features. In light of this, we propose a method for extending semantic attributes with the aim of generating diversified features that simulate the actual distribution of real visual features. This is intended to enhance the performance of classifiers within object detection models. We tested our method on two common object detection datasets: MS COCO and PASCAL VOC.

並列關鍵字

zero-shot object detection ； generative adversarial learning ； visual-semantic relationship ； attribute augmentation

參考文獻

J. Han, D. Zhang, G. Cheng, N. Liu, and D. Xu, “Advanced deep-learning techniques for salient and category-specific object detection: A survey”, IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 84–100, 2018.

Google Scholar

Y. Yuan, X. Liang, X. Wang, D.-Y. Yeung, and A. Gupta, “Temporal dynamic graph lstm for action-driven video object detection”, in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1801–1810.

Google Scholar

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks”, Advances in neural information processing systems, vol. 28, 2015.

Google Scholar

Z. Liu, Y. Lin, Y. Cao, et al., “Swin transformer: Hierarchical vision transformer using shifted windows”, in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.

Google Scholar

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “Endto-end object detection with transformers”, in European conference on computer vision, Springer, 2020, pp. 213–229.

Google Scholar

國際替代計量

屬性增強於零樣本物件偵測

主題瀏覽