多場域影像轉譯 (multi-domain image-to-image translation) 是將影像由一個場域(domain)轉譯到其他多個場域的研究。近年來,許多影像轉譯的研究已經能夠利用生成方式對抗網路(generative adversarial network)的方法,從具有場域標記的資料中,學習場域之間的關係,建立複雜的生成模型。然而,這類型的演算法的學習成效仰賴於大量的標記資料,所以建構這樣的模型需要花費很高的時間與成本。 為了降低成本,本論文提出 SemiStarGAN,結合兩個半監督式學習技術: self ensembling 與 pseudo labeling,並提出名為 Y model 的新網絡參數共享方式, 將網絡中的判別器(discriminator) 與輔助分類器(auxiliary classifier) 的參數部分共享,以提升輔助分類器的泛化能力及穩定性。 本論文設計了人臉特徵轉譯的實驗,比較 StarGAN 與 SemiStarGAN 在不同標記資料量下的生成表現。實驗結果證實了我們所提出來的方法,僅需較少的標記資料,即可達到與 StarGAN 同等的轉譯效果。
Recent studies have shown significant advance for multi-domain image-to-image translation, and generative adversarial networks (GANs) are widely used to address this problem. However, existing methods all require a large number of domain-labeled images to train an effective image generator, but it may take time and effort to collect a large number of labeled data for real-world problems. In this thesis, we propose SemiStarGAN, a semi-supervised GAN network to tackle this issue. The proposed method utilizes unlabeled images by incorporating a novel discriminator/classifier network architecture Y model, and two existing semi-supervised learning techniques---pseudo labeling and self-ensembling. Experimental results on the CelebA dataset using domains of facial attributes show that the proposed method achieves comparable performance with state-of-the-art methods using considerably less labeled training images.