透過您的圖書館登入
IP:3.19.244.133
  • 學位論文

基於屬性潛在空間上的面部圖像處理

Attribute-Based Facial Image Manipulation on Latent Space

指導教授 : 吳家麟
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


現今機器學習在圖像生成的技術已越來越成熟,其中又以 styleGAN、BigGAN 等對抗式生成網路的結果更為令人驚豔。但不幸地,這些模型的架構讓我們很難對輸出圖像進行調控,造成徒有生成結果好,可再細部操作地方卻很少的窘進。 因此就有人嘗試在隱藏空間 (Latent Space) 中對隱藏碼 (latent Code) 進行編輯,以達到可在不改變原模型的架構及已耗時耗能學好的參數為前提下,僅透過將新的隱藏碼嵌入原模型中,就做到對輸出圖像進行調控的效果。但這些方法都存有不同的限制:例如,不適用於隱藏空間較大的狀況或會產生不同特徵間相互糾纏的問題。 因此在本論文中,我們提出兩種方法來解決上述所提及的問題:一種是透過將原隱藏空間壓縮來讓受限於隱藏空間大小的分析方法能重新起用;另一種是透過額外訓練一個較簡單之模型來針對不同的隱藏碼產生出最適合調控效果的新隱藏碼。這兩種方法相比於之前的方法能適用於更多種類的模型,且仍然能做到圖片控制的效果。

並列摘要


Nowadays, using machine learning to generate images has become more and more mature, especially the images produced by using Generative Adversarial Network. Unfortunately, the complicated architecture of those models makes it difficult for us to ensure the output images' diversity and controllability without introducing little embarrassment in implementation. Therefore, some researchers try to edit the latent codes generated by a given learning model directly on the latent space to manipulate the output image by simply inputting the new latent codes into the original model without changing the model's structure and learned parameters. However, these methods faced the problems that the size of latent space cannot be too large or the occurrence of features entanglement. In this work, we propose two approaches to conquer the problems mentioned above. The first is to compress the original latent space to better the applicability and usability of the methods limited by the size of the latent space. The second is to train another model to find appropriate controlled tensor. Compared with the existing methods, these two methods can be applied to more models and still reach the target of image manipulation.

參考文獻


Rameen, A., Yipeng, Q., Peter, W. Image2stylegan: How to embed images into the stylegan latent space? ICCV 2019
Jiapeng Zhu, Yujun Shen, Deli Zhao, Bolei Zhou. In-Domain GAN Inversion for Real Image Editing. ECCV 2020
Jinjin Gu, Yujun Shen, and Bolei Zhou. Image processing using multi-code gan prior. CVPR, 2020.
Y. Shen, J. Gu, X. Tang, and B. Zhou. Interpreting the latent space of gans for semantic face editing. CVPR 2020
E. Hark onen, A. Hertzmann, J. Lehtinen, and S. Paris. Ganspace: Discovering interpretable gan controls. NeurIPS 2020

延伸閱讀