Melody Extraction for MIDI Data

指導教授 : 丁肇隆
共同指導教授 : 黃乾綱(Chien-Kang Huang)


在眾多音樂資料檢索的問題當中,旋律萃取為至關重要的一環。旋律作為音樂結構中最基本的要素之一,在音樂作曲與分析中都扮演關鍵的角色。 在此篇論文中,我們提出了一個資料驅動的系統,藉由建立深度學習的模型並加以訓練,使其可以預測MIDI音樂中音符之間的親近度(affinity)。此系統將音樂的音符視為圖論(graph)中的頂點(vertex),模型學習出的親近度視為頂點之間的邊(edge),進而將MIDI音樂轉化成一張無向圖。我們將一首MIDI樂曲切成數個段落,並利用圖分群的演算法,將無向圖劃分成兩張子圖,他們分別代表一首樂曲段落中旋律及伴奏的部分。最後我們再使用集成學習(ensemble learning)中的投票法,將每個段落的旋律合併成一個完整的旋律,以代表此旋律萃取系統的最後產出結果。 為了證實此系統的可行性,我們對深度學習模型進行超參數優化以及使用不同的資料集驗證旋律萃取的結果,並和其他現有的旋律萃取模型進行比較,展現此系統的在不同資料集上的泛化能力。


Melody extraction is a crucial task of music information retrieval, as melody being one of the most important elements in musical composition and analysis. In this thesis, we propose a data-driven based framework, having a 1-D convolutional neural network learn affinities between musical notes. A music piece in MIDI format can then be represented as a weighted undirected graph, with notes being vertices and learned affinity values being the weighted edges in the graph. With graph partition algorithm, the melody track and the accompaniment track in a segment of the music piece are then obtained by applying spectral clustering over the learned graph. Finally, we use a voting system to merge all segmented melodies into a integrated one as our final result of melody extraction. Our proposed framework only takes musical notes as input without using further information (e.g., time signature and key signature), which is flexible with both well-labeled data and real-world performance data. The framework is tested and validated on multiple datasets with different hyperparameter settings, and the performance is compared with other rule-based and data-drive methods.


