透過您的圖書館登入
IP:3.19.27.178
  • 學位論文

考慮調性、和聲與樂句的音樂自動生成

Classical Music Transformer: An Investigation on Tonality, Harmony and Phrase

指導教授 : 張智星
共同指導教授 : 蘇黎(Li Su)

摘要


在機器學習的應用上,音樂相關的研究雖然和自然語言處理相似,但舉凡和弦、調性等等音樂的特性,使得音樂處理中有更多可以考慮的面向。音樂生成是音樂處理其中一個題目,主要目的是讓模型能在短時間內自動生成全新的音樂。近年來,隨著機器學習模型的日新月異,音樂生成這塊領域也與之蓬勃發展,不但生成結果更流暢,更能生成複雜也有架構的音樂。 本論文以語言的形式看待音樂,定義音高、音長等音樂中的元素為所謂的音樂事件,依此將音樂表示成一連串的序列,並透過語言模訓練自迴歸模型,最後從預測機率取樣達到音樂生成的結果。我們使用貝多芬鋼琴奏鳴曲之功能和聲資料集作為訓練資料,並利用資料集中專業音樂人所標記之和弦、調性與樂句標籤設計音樂事件加入模型,生成帶有豐富音樂資訊的音樂。此外,我們更進一步透過和弦與調性標籤設計損失函數,使生成之音樂更具音樂性。最後我們設計主觀測試問卷,比較模型中音樂資訊有無對於聽感與樂理上的差異。

並列摘要


Music generation, one of the applications of machine learning, is similar to natural language processing, while structural information in music such as chords, tonality, etc., makes the topic more unique in this field. The main purpose of music generation is to allow model to automatically generate brand new music in a short time. In recent years, with the rapid development of machine learning models, the field of music generation has also developed vigorously. Not only the results are smoother, but also complex and structured music can be generated. This thesis treats music in the form of language. We define elements in music such as note pitch and note length as so-­called events. Music is converted into sequences of events, modeled by a transformer-­based auto-­regressive model, and sampled from the predicted probability for generation purposes. Beethoven Piano Sonata with Function Harmony, a symbolic dataset with rich musical labels, is used in this work. We use the chord, tonality and phrase label marked by professional musicians in the dataset to design additional events to generate music with rich music information. In addition, we further design the loss function through chord and tonality label to make the generated music more harmony. Finally, we devise an online questionnaire to compare whether the music information in the model has any difference in the sense of hearing and music theory.

參考文獻


[1] Gino Brunner, Andres Konrad, Yuyi Wang, and Roger Wattenhofer. MIDI­VAE: modeling dynamics and instrumentation of music with applications to style transfer. In Emilia Gómez, Xiao Hu, Eric Humphrey, and Emmanouil Benetos, editors, Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23­27, 2018, pages 747–754, 2018.
[2] Hao­Wen Dong, Wen­Yi Hsiao, Li­Chia Yang, and Yi­Hsuan Yang. Musegan: Multi­track sequential generative adversarial networks for symbolic music generation and accompaniment. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty­Second AAAI Conference on Artificial Intelligence, (AAAI­18), the 30th innovative Applications of Artificial Intelligence (IAAI­18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI­18), New Orleans, Louisiana, USA, February 2­7, 2018, pages 34–41. AAAI Press, 2018.
[3] Cheng­Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. Music transformer: Generating music with long­term structure. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6­9, 2019. OpenReview.net, 2019.
[4] Yu­Siang Huang and Yi­Hsuan Yang. Pop music transformer: Beat­based modeling and generation of expressive pop piano compositions. In Chang Wen Chen, Rita Cucchiara, Xian­Sheng Hua, Guo­Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann, editors, MM ’20: The 28th ACM International Conference on 53doi:doi:10.6342/NTU202102219 Multimedia, Virtual Event / Seattle, WA, USA, October 12­16, 2020, pages 1180–1188. ACM, 2020.
[5] Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. In The 9th ISCA Speech Synthesis Workshop, Sunnyvale, CA, USA, 13­15 September 2016, page 125. ISCA, 2016.

延伸閱讀