多模式隨意影像風格轉換系統

本論文主要研究隨意風格轉換，針對 X. Li等人[15]所提出的線性風格轉換系統（Linear Style Transformation，LST）做探討並以各種面相做改進。LST希望利用深度神經網路來學習模擬白化再著色（Whitening - Coloring Transform，WCT）的運算，以達到風格轉換的目的。WCT中的「白化」與「著色」算是互逆的運算，然而在LST的訓練過程中並未特別訓練這兩個運算的互逆性。本論文的研究內容之一即是針對這一部分做改進，期望能改善轉換結果。另外在LST中的轉換矩陣T是全域轉換，然而考慮到每個通道應同時具有一些獨立性與相關性，我們提出分割再轉換的方式，能在網路參數量大幅降低的架構下，不只取得通道間的獨立性與全域性之平衡，還能得到更好的轉換結果。最後我們利用在轉換當中加入隨機雜訊的方式，讓網路能生成多樣的轉換結果。

關鍵字

捲積類神經網路；深度學習；影像風格轉換； WCT ； IN ； EVD ；共變異矩陣；正交矩陣； Gram matrix

並列摘要

We propose two schemes to improve the linear style transfer system (LST) proposed by Li et al., which employs deep neural networks to learn the Whitening and Coloring Transform (WCT) for style transfer. The schemes are proposed for tackling the following two problems from LST: (1) Whitening and coloring operations are inverse operations of each other, but the inverse of these two operations is not emphasized in the training process of LST, and (2) Different channels in the content feature might have not only some correlation but also some independence. The transformation matrix learned by the network in LST being used to transform the whole content feature is considered too global. To strengthen the invertibility between the whitening and coloring operations we propose an additional identity loss. To balance between globality and locality of the trained transformation, we propose a split-and-transform scheme. Experimental results show that the proposed schemes not only greatly reduce the amount of network parameters, but also helps yielding better transferred results.

並列關鍵字

convolutional neural network ； deep learning ； image style transfer ； covariance ； orthogonal matrix ； Gram matrix ； WCT ； IN ； EVD

參考文獻

[1].T. C. Wang, M. Y. Liu, J. Y. Zhu, A. Tao, J. Kautz, and B.Catanzaro, “High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs,” arXiv:1711.11585 [cs.CV], Aug., 2018.

Google Scholar

[2].T. Miyato and M. Koyama, “CGANs with Projection Discriminator,” arXiv:1802.05637 [cs.LG], Aug., 2018.

Google Scholar

[3].J. Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, “Toward Multimodal Image-to-Image Translation,” arXiv:1711.11586 [cs.CV], Oct., 2018.

Google Scholar

[4].J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” arXiv:1703.10593 [cs.CV], Nov., 2018

Google Scholar

[5].P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” arXiv:1611.07004 [cs.CV], Nov., 2018.

Google Scholar

國際替代計量

多模式隨意影像風格轉換系統

全文下載

主題瀏覽