在機器學習的應用上,音樂相關的研究雖然和自然語言處理相似,但舉凡和弦、調性等等音樂的特性,使得音樂處理中有更多可以考慮的面向。音樂生成是音樂處理其中一個題目,主要目的是讓模型能在短時間內自動生成全新的音樂。近年來,隨著機器學習模型的日新月異,音樂生成這塊領域也與之蓬勃發展,不但生成結果更流暢,更能生成複雜也有架構的音樂。 本論文以語言的形式看待音樂,定義音高、音長等音樂中的元素為所謂的音樂事件,依此將音樂表示成一連串的序列,並透過語言模訓練自迴歸模型,最後從預測機率取樣達到音樂生成的結果。我們使用貝多芬鋼琴奏鳴曲之功能和聲資料集作為訓練資料,並利用資料集中專業音樂人所標記之和弦、調性與樂句標籤設計音樂事件加入模型,生成帶有豐富音樂資訊的音樂。此外,我們更進一步透過和弦與調性標籤設計損失函數,使生成之音樂更具音樂性。最後我們設計主觀測試問卷,比較模型中音樂資訊有無對於聽感與樂理上的差異。
Music generation, one of the applications of machine learning, is similar to natural language processing, while structural information in music such as chords, tonality, etc., makes the topic more unique in this field. The main purpose of music generation is to allow model to automatically generate brand new music in a short time. In recent years, with the rapid development of machine learning models, the field of music generation has also developed vigorously. Not only the results are smoother, but also complex and structured music can be generated. This thesis treats music in the form of language. We define elements in music such as note pitch and note length as so-called events. Music is converted into sequences of events, modeled by a transformer-based auto-regressive model, and sampled from the predicted probability for generation purposes. Beethoven Piano Sonata with Function Harmony, a symbolic dataset with rich musical labels, is used in this work. We use the chord, tonality and phrase label marked by professional musicians in the dataset to design additional events to generate music with rich music information. In addition, we further design the loss function through chord and tonality label to make the generated music more harmony. Finally, we devise an online questionnaire to compare whether the music information in the model has any difference in the sense of hearing and music theory.