用於單音人聲和複音音樂的抖音偵測

抖音是指音高上出現規律性的波動變化，是一種音樂效果，它包括音高變化的數量和音高變化的速度，在民歌和歌劇等音樂形式當中頻繁出現。本論文需要探尋可行的特徵參數擷取，實驗各種可行的分類方法，期許建立一套合理的抖音偵測方法。論文主體包括「單音人聲的抖音辨識」和「複音音樂的抖音辨識」，兩種情況下的辨識方法，各不相同，需要分別找尋合適的特徵擷取和分類方法，期望達到較高的辨識效果。首先在「單音人聲的抖音辨識」中，根據定義，對音高和音量提取特徵。獲取正弦擬合所得到的振幅、頻率及擬合誤差作為特徵參數。並通過特徵選取等相關實驗，從以上的特徵中選取合適的特徵。並比較靜態分類器和隱藏式馬可夫模型(HMM)的辨識率以及曲線下面積(area under curve, AUC)的高低，尋找合適的分類方法。而在對「複音音樂的抖音辨識」的實驗中，複音音樂下音高追踪的結果無法做到準確，需要找尋新的特徵擷取方法。這裡就用到低階描述子(low level descriptor, LLD)及差量回歸係數(delta regression coefficients)。並且從中獲取包括關於臨界(extremes)、慣量(moments)、離散餘弦變換、離散傅立葉變換等的特徵。通過實驗挑選合適的特徵與分類方法。

關鍵字

抖音偵測；隱藏式馬可夫模型；正弦擬合；低階描述子；曲線下面積

並列摘要

Vibrato is a musical effect consisting of a regular, pulsating change of pitch. It includes the extent of vibrato and the rate of vibrato. Vibrato always appears in folk and opera. This thesis is to explore the feasible method for feature extraction, and experiment some kinds of classifier, then build an reliable solution of vibrato detection. The research thesis includes "vibrato detection for monophonic vocals" and "vibrato detection for polyphonic music". Hoped to get the better recognition rate, we try the different case of feature extraction and classifier in the two cases. At first in "vibrato detection for monophonic vocals", we do feature selection with pitch and volume. By sine fit, get the amplitude, frequency and error of the pulsating change. Do experiments such as feature selection and so on, then get the better feature. Get the recognition rates and area under curve (AUC) of static classifier, compare the result with hidden Markov model (HMM), try to get the appropriate classifier. Then in "vibrato detection for polyphonic music", pitch tracking cannot be accurate. It needs to found new method for feature extraction. Low level descriptor (LLD) and delta regression coefficients is used to extract some feature about extreme, moments, Discrete Cosine Transform, Discrete Fournier Transform and so on. Do some experiment to select appropriate features and find the appropriate classifier.

並列關鍵字

無資料

參考文獻

【1】 Felix Weninger, Noam Amir, Ofer Amir, Irit Ronen, Florian Eyben, and Bjorn Schuller, Robust feature extraction for automatic recognition of vibrato singing in recorded polyphonic music, ICASSP , 2012.

【4】 D. J. Hermes, Measurement of pitch by subharmonic summation, Journal of the Acoustical Society of America, vol. 83, no. 1, pp. 257–264, 1988.

【5】 Fredrick Jelinek, Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech, IEEE Transactions on Information Theory, VOL. IT-21, NO. 3, May 1975.

【7】 Ren Gang, Justin Lundberg, Gregory Bocko, Dave Headlam, and Mark F. Bocko, What makes music musical? A framework for extracting performance expression and emotion in musical sound, DSP/SPE, 2011.

【8】 Bradley, A.P. The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30. 1145-1159, 1997.

國際替代計量

用於單音人聲和複音音樂的抖音偵測

主題瀏覽