在本文中,我們針對語義分割任務解決了無監督域自適應問題。在此問題中,我 們嘗試從具有正確標註的合成數據集中學習知識,並轉移到沒有任何標註的真實世界 圖像。我們假設一張圖片的結構是進行語義分割最具信息性和決定性的因素且不受不同資料集所影響,因此我們提出了 Domain Invariant Structure Extraction (DISE)框架,用於將圖像解析為域不變結構和域特定的紋理特徵。該框架能進一步實現跨域的圖像轉換和運用標籤轉移以進一步提高模型在語意分割任務的效能。大量實驗驗證了我們提出的 DISE 模型的有效性,並證明了其優於幾種最先進的方法和其在其他視覺任務的潛力。
In this thesis we tackle the problem of unsupervised domain adaptation for the task of semantic segmentation, where we attempt to transfer the knowledge learned upon synthetic datasets with ground-truth labels to real-world images without any annotation. With the hypothesis that the structural content of images is the most informative and decisive factor to semantic segmentation and can be readily shared across domains, we propose a Domain Invariant Structure Extraction (DISE) framework to disentangle images into domain-invariant structure and domain-specific texture representations, which can further realize image-translation across domains and enable label transfer to improve segmentation performance. Extensive experiments verify the effectiveness of our proposed DISE model and demonstrate its superiority over several state-of-the-art approaches and potential for other vision tasks.