透過您的圖書館登入
IP:3.19.30.232
  • 學位論文

用於無監督的非平行樂器音樂轉換的多重鑑別器循環式生成對抗網路

A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Instrumental Music Conversion

指導教授 : 吳家麟

摘要


一位訓練有素的音樂家能夠輕易地即興表演出有趣的音色轉移,例如小提琴手可以優雅的演奏莫札特的土耳其進行曲或是貝多芬著名的鋼琴曲,給愛麗絲。深度學習模型近期在影像領域成功崛起,生成對抗網路更是廣泛的應用在領域轉移的問題上。 在這篇論文中,我們提出了改良於循環式生成對抗網路(CycleGAN)的生成模型,試著做出樂器音色轉換的應用。我們所建構的系統引入了多重對抗網路,主要專注在鑑別不同頻率頻帶的局部細節上。同時,我們透過測試MagnaTagATune 音源數據集轉換的頻譜結果來比較我們提出的模型和原始版本CycleGAN 的差異性。我們提出的模型有兩個特點,一個是更具信賴的對抗網路,另一個是更加健全的生成網路來達成樂器音色轉移和產生更具自然的音樂。

關鍵字

音色轉換 深度學習

並列摘要


A trainable instrumental artist can easily give an interesting domain translation performance, such as a violin player can cover Mozart’s Rondo Alla Turca or gently perform the well-known piano song, fur elise from Beethoven. Given the success of deep neural network in image processing, the generative adversarial network based model is broadly used for domain transfer problem. In this thesis, we present a light-weight generative model based on cycle-consistent adversarial network (CycleGAN) for instrumental music conversion. The proposed model employs multiple discriminator that focuses on fine-grained local details of the frequency features. We also evaluate the original CycleGAN model and the multiple independent discriminator based CycleGAN model on the MagnagTagATune dataset. As a result, we have 1) a reliable discriminator that reduces the number of parameter and 2) a better generator that is able to transfer the characteristics between different types of musical instrument and generate more natural domain specific instrumental music.

參考文獻


[1] Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." arXiv preprint arXiv:1703.10593 (2017).
[2] Paszke, Adam, et al. "Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration, may 2017."
[3] Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style transfer using convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[4] Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. "Perceptual losses for real-time style transfer and super-resolution." European Conference on Computer Vision. Springer, Cham, 2016.
[5] Haque, Albert, Michelle Guo, and Prateek Verma. "Conditional End-to-End Audio Transforms." arXiv preprint arXiv:1804.00047 (2018).

延伸閱讀