分佈外(Out-of-Distribution, OOD)檢測的目標是賦予模型能力,使其能識別非訓練樣本分佈的輸入。此能力對於將模型部署於真實環境,尤其在醫療診斷與自動駕駛等安全關鍵領域,顯得格外重要。傳統上,許多研究利用捲積神經網絡(CNN)透過視覺特徵來進行OOD檢測。近年來,隨著視覺語言模型(Vision-Language Model, VLM)的興起,開啟了利用這些模型的綜合處理能力進行OOD檢測的新途徑。這些模型結合了標籤的語義與視覺特徵,進行零樣本或少樣本學習,以提高模型在多變環境中的適應性和效能。在本論文中,我們提出了一種創新的類別標籤語意集成方法,利用預訓練視覺語言模型的強大知識庫,透過類別標籤語意相近的特徵,使模型學習到更精確的類別特徵。此外,我們進一步結合負語意標籤進行少樣本訓練。實驗結果顯示,在以ImageNet-1K作為分佈內資料集時,我們的方法相較於現有基於VLM的方法,在分佈外資料集上顯著降低了FPR95,平均減少了11.33個百分點,並將AUROC平均提升了2.47個百分點,顯示出顯著的效能增益.
Out-of-Distribution (OOD) detection aims to empower models with the ability to recognize inputs that deviate from the training sample distribution. This capability is crucial when deploying models in real-world settings, particularly in safety-critical areas such as medical diagnostics and autonomous driving systems. Traditionally, many studies have employed Convolutional Neural Networks (CNNs) to conduct OOD detection through visual features. Recently, with the advent of Vision-Language Models (VLMs), a new approach has emerged that leverages the comprehensive processing power of these models for OOD detection. These models integrate semantic and visual features of labels to facilitate zero-shot or few-shot learning, enhancing the model's adaptability and performance in diverse environments. In this work, we leverage the pretrained knowledge of VLMs and introduce an innovative method called the Positive Label Semantic Ensemble. Our model learns more precise category features by harnessing semantically related features to class labels. Additionally, we incorporate negative semantic labels in our few-shot training approach. Experimental results demonstrate that, with ImageNet-1K as the in-distribution dataset, our method significantly reduces the FPR95 by an average of 11.33 percentage points and increases the AUROC by an average of 2.47 percentage points compared to existing VLM-based methods, showing a substantial performance improvement.