變分自動編碼器用於複音音樂插值

本論文旨在使用機器學習技術來解決插值音樂作曲的新型問題。我們提出兩個基於變分自動編碼器的模型來給予兩首歌曲之間生成適當的多音軌旋律，以便流暢地改變音高與動態去橋接。第一個模型產生的插值音樂表現超越隨機產生的資料基底與雙向LSTM的方法，其表現可與當前最新技術相媲美。而第二個新穎架構的模型用超越目前技術水準的插值方法去重建誤差，它利用額外的類神經網路去直接估算插值編碼的向量。此外，我們製造的新竹插值MIDI資料集使得訓練文獻中的方法與論文中的模型在計算與時間要求上更有效率。最後我們完成量化的使用者調查去確保結果的效力。

關鍵字

變分自動編碼器；複音音樂；插值；自動編碼器

並列摘要

This thesis aims to use Machine Learning techniques to solve the novel problem of music interpolation composition. Two models based on Variational Autoencoders (VAEs) are proposed to generate a suitable polyphonic harmonic bridge between two given songs, smoothly changing the pitches and dynamics of the interpolation. The interpolations generated by the first model surpass a Random data baseline and a bidirectional LSTM approach and its performance is comparable to the current state-of-the-art. The novel architecture of the second model outperforms the state-of-the-art interpolation approaches in terms of reconstruction loss by using an additional neural network for direct estimation of the interpolation encoded vector. Furthermore, the Hsinchu Interpolation MIDI Dataset was created, making both models proposed in this thesis more efficient than previous approaches in the literature in terms of computational and time requirements during training. Finally, a quantitative user study was done in order to ensure the validity of the results.

並列關鍵字

Variational Autoencoder ； Polyphonic music ； Interpolation ； Autoencoder ； VAE

參考文獻

[1] L. Weng, “From autoencoder to betavae.” http://lilianweng.github.io/lil-log/ 2018/08/12/from-autoencoder-to-beta-vae.html,2018.

Google Scholar

[2] C.Doersch,“Tutorialonvariationalautoencoders,”2016.

Google Scholar

[3] D. P. Kingma and M. Welling, “An introduction to variational autoencoders,” FoundationsandTrendsinMachineLearning,pp.1–18,2019. [4] A.TöscherandM.Jahrer,“Thebigchaossolutiontothenetflixgrandprize,”2009.

Google Scholar

[5] N. Jiang, S. Jin, Z. Duan, and C. Zhang, “Rlduet: Online music accompaniment generationusingdeepreinforcementlearning,”2020.

Google Scholar

[6] S. I. Mimilakis, E. Cano, J. Abeßer, and G. Schuller, “New sonorities for jazz recordings: Separationandmixingusingdeepneuralnetworks,”2016.

Google Scholar

延伸閱讀

陸勁逢（2010）。使用循序式文本不特定分段法之自動音素邊界點偵測〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2010.00529
Chan, C. Y. (2012). 基於節拍同步和音色不變量之音色頻譜和交叉遞回圖分析之翻唱歌曲辨識系統 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2012.10034
李宗翰、白明憲（2018）。以疊代式壓縮感知LASSO演算法進行聲源定位與分離。載於中華民國振動與噪音工程學會（主編），中華民國振動與噪音工程學會論文集（頁135-138）。中華民國振動與噪音工程學會。https://www.airitilibrary.com/Article/Detail?DocID=a0000192-201806-201809260005-201809260005-135-138
Darsono, A. M., Haron, N., Saat, S., Ibrahim, M., & Manap, N. (2014). Blind Audio Source Separation with Sparse Nonnegative Matrix Factorization. Research Journal of Applied Sciences, Engineering and Technology, 7(23), 5015-5020. https://www.airitilibrary.com/Article/Detail?DocID=20407467-201406-201511180029-201511180029-5015-5020
謝仲其（2013）。Building an Algorithmic Compositional System Based on Music Analysis〔碩士論文，國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2013.00035

國際替代計量

變分自動編碼器用於複音音樂插值

全文下載

主題瀏覽