利用錯誤標籤學習演算法於視覺問答應用

本論文利用錯誤標籤學習演算法來處理視覺問答模型之訓練。在視覺問答應用中有錯誤的標籤指的是，不同人對於相同的圖像-問題配對的回答可能不同。此情況通常發生在圖像容易使人產生混淆的情況下。使用有錯誤的標籤進行監督式學習會使視覺問答模型效能降低。為了解決視覺問答資料庫中標籤有錯誤的情況，我們研究了三個主流的錯誤標籤學習演算法，包含(1)損失修正演算法，(2)標籤清理演算法，(3)圖模型演算法。我們將上述的方法實作在dual attention 視覺問答模型中，比較不同方法在VirginiaTech 視覺問答資料集的效果。實驗結果顯示: (1)損失修正演算法依靠模型網路找到正確的標籤轉換機率或偵測出可能錯誤的標籤，來提升模型正確率。(2)在標籤清理演算法中如果資料集提供足夠多經過驗證的資料來訓練標籤清理網路，使模型有能力修正錯誤標籤，則能夠提升模型正確率。(3)在圖模型演算法中，如果模型網路能夠透過輸入資料判斷標籤是否會產生錯誤，才能達到提升模型正確率的效果。此外，視覺問答模型本身的能力將影響錯誤標籤學習演算法的效能。

關鍵字

監督式學習；深度學習；視覺問答；有錯誤標籤

並列摘要

This thesis conducts a study of learning algorithms to address noisy label issues inherent in Visual Question Answering (VQA) tasks. The noisy labelling in VQA tasks refers to the phenomenon of possibly collecting different answers to an image-question pair from different human subjects. This often arises because some image-question pairs may create an ambiguous context that leads to indefinite answers. When trained with such noisy supervision, the performance of the VQA model suffers. To address noisy label issues, we first survey three mainstream algorithms for learning from noisy labels, including (1) loss-correction, (2) label cleansing and (3) graphical models. We then implement these algorithms based on a dual attention VQA network (which we call the base VQA model) and test their performance on VirginiaTech VQA dataset. Experimental results show that (1) the performances of the loss-correction algorithms rely heavily on accurate estimation of label transition probabilities due to noise or accurate detection of noise level, that (2) the label cleansing algorithms require enough verified labels to perform effectively, and that (3) the graphical models need to differentiate the noise level of each QA input to work well. In addition, the capability of the base VQA model can have a profound effect on the performances of these noisy label learning algorithms.