透過您的圖書館登入
IP:3.145.88.130
  • 學位論文

自動和弦辨識針對大詞彙集改善之研究

A Study for Improving Large-vocabulary Automatic Chord Recognition

指導教授 : 黃乾綱
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


和弦辨識的能力在於音樂作曲、音樂彈奏演唱的領域皆是十分重要的技術之一,過去人們多以手工標註的方式來完成,但這樣的方式除了需要耗費大量的勞力和時間外,更需要具備相當的專業音樂知識。因此,本研究提出一自動和弦辨識模型(Automatic Chord Recognition System),通用於辨識小詞彙集及大詞彙集和弦,在提升小詞彙集辨識分數的同時,亦改善在大詞彙集和弦上之表現,其中包括增加辨識的和弦種類以及評估分數之提升。 在現今的自動和弦辨識研究中,使用深度學習的神經網路架構已成為主流,人們可以針對不同的需求,去建立不同的模型。我們在實驗中利用三個流通資料集作為訓練及測試資料,設計了一個以卷積神經網路為基礎的特徵萃取器,加上以雙向長短期記憶神經網路 (bi-directional long short term memory)及條件隨機域 (conditional random fields)設計之解碼模型,分別對於小詞彙集和弦以及大詞彙集和弦進行實驗。其中,在小詞彙集的實驗中,WCSR(Weighted Chord Symbol Recall)分數平均可達到84.3%,與同為使用深度學習架構的模型相比,最高可提升8.8%,顯示了我們所設計模型中的之特徵萃取器,能夠有效地學習到更精準的特徵,且與解碼模型配合時能有效地達到提升辨識率之目的。接著,在大詞彙集的實驗中,我們將評估指標由原本的一個增加到六個,且在擴增可辨識和弦種類的同時,維持原本小詞彙集和弦的辨識率,且在七和弦評估標籤中WCSR分數獲得71.5%,四重音評估標籤中WCSR分數獲得66.1%,與其他模型相比提升約1-2%。 為了達到改善大詞彙集辨識率的目的,我們更加入兩種方法以提升分數。首先,我們針對稀缺的七和弦,加入新的訓練資料,試圖解決在現有資料集中和弦分佈不均的問題,並在七和弦評估標籤中獲得WCSR分數72.1%,較原先提升0.6%。再來,我們撇除掉現今大部分研究所使用的扁平分類概念,回歸到和弦原始的精確定義,針對決定和弦種類的關鍵音符設計一個閾值規制決策法,並用以評估這些複雜的擴展和弦,並在七和弦評估標籤中WCSR分數獲得74.5%,共提升3%,四重音評估標籤中WCSR分數獲得68.4%,共提升2.3%,且同時可辨認轉位和弦,可辨識和弦量提升為原先之三倍。藉由這兩大部分的實驗,有效地驗證了此模型之通用性,並改善大詞彙集和弦之辨識率以及增加可辨識和弦的數量。

並列摘要


Chord Recognition ability is an essential skill for composers and people who play music. In the past, most of the annotation data are human-annotated, but doing this not only requires a lot of labor and time but also requires considerable professional music knowledge. In this thesis, we proposed an automatic chord recognition system, which can be applied to both small-vocabulary regimes and large-vocabulary regimes. It is not only capable to improve the score on small vocabularies, but also improve the score of the numbers of chords that can be recognized on large vocabularies. It has become a trend to use deep learning architecture in ACR tasks, people can build different models for their needs. We use three public datasets as our training data and testing data, and we design a convolutional neural network-based feature extractor with a Bidirectional long short-term memory-conditional random field decoding model, experimented on both small and large vocabularies separately. In the small-vocabulary experiment, the average WCSR (Weighted Chord Symbol Recall) we got is 84.3%. comparing to other NN-based models, we improved at most 8.8%, which shows that the feature extractor we designed can effectively learn more accurate features, and can effectively achieve the purpose of improving the WCSR score when combined with the decoding model. Furthermore, in the experiment of the large-vocabulary set, we increased the evaluation metrics from one to six, and we maintained the recognition score of the original small-vocabulary set while expanding the types of recognizable chords, and evaluated the label in the seventh chord. Comparing to other baseline models, we got 71.5% in the seventh metric and 66.1% in the tetrad metric, which was an increase of about 1-2%. In order to achieve the purpose of improving the recognition score on large vocabularies, we have also designed two methods. First of all, we added a new dataset for the scarce and complete chords, trying to solve the problem of the unbalanced distribution of chords in the previous dataset. And we get a WCSR score of 72.1%, which is an increase of 0.6% compared to the original. Next, we get rid of the concept of flat classification used in most researches today, and then we consider the original precise definition of chords to design a threshold method for evaluating these complex chords. In the experiment, we got a WCSR score of 74.5% in the seventh metric, 3% improvement in total, and a WCSR score of 68.4% in the tetrad metric, 2.3% improvement in total. At the same time, inverted chords are recognized, and the number of recognizable chords can be tripled. Experiments show that the model is compatible for both regular major/minor triad chords (small-vocabulary) and larger vocabulary chord recognition, and is capable of improving the recognition accuracy of complex chords.

參考文獻


1. Chris Harte, Towards automatic extraction of harmony information from music signals. 2010.
2. John Ashley Burgoyne, Jonathan Wild, and Ichiro Fujinaga, An Expert Ground Truth Set For Audio Chord Recognition And Music Analysis, in Proc. Int. Soc. Music Inf. Retrieval Conf. 2011. p. 633–638.
3. Masataka Goto, Hiroki Hashiguchi, Takuichi Nishimura, and Ryuichi Oka, RWC music database: Popular, classical, and Jazz music databases, in Proc. Int. Soc. Music Inf. Retrieval Conf. 2002. p. 287–288.
4. Bruce Benward and Marilyn Saker, Music: In Theory and Practice. seventh ed. Vol. 1. 2003. 67, 359.
5. Meinard Müller, Fundamentals of Music Processing. 2015: Springer International Publishing. XXIX, 487.

延伸閱讀