透過您的圖書館登入
IP:18.221.146.223
  • 學位論文

基於鑑別式特徵參數求取之強健性聲音事件分類

Discriminative Feature Extraction for Robust Audio Event Classification

指導教授 : 廖元甫

摘要


非語音的事件聲音,在某些特定環境下是相當有意義的資訊。本論文主要探討在非語音聲音事件分類方面,除了普遍使用的梅爾倒頻譜參數之外,是否有對於非語音的音訊更能顯現出其特別資訊的音訊特徵參數,以及能增加在雜訊環境下辨識效能的參數組合。因此,我們考慮以時頻分析和圖樣特徵的概念來擷取出音訊特徵參數,所以我們將使用賈柏濾波器參數,或利用主成分分析和線性鑑別分析等分析方法求出語料驅動濾波器參數,當作我們的新類型音訊特徵參數,最後再運用最小分類錯誤法則對已得的音訊特徵參數做微調,希望能求取出更具有鑑別力的音訊特徵參數。 實驗用的語料是 RWCP (Real World Computing Partnership) 中的105種乾淨的事件聲音,在加入Aurora 2複合情境模式的雜訊之後,使用我們設計的音訊特徵參數去訓練模型及進行測試。在實驗之後發現,我們求取出的新音訊特徵參數比起傳統音訊特徵參數的分類錯誤率從4.13%降低到3.17%,因此我們採用新類型音訊特徵參數的系統架構確實能對於聲音事件分類達到強健性的效果,也能確認新類型音訊特徵參數對於非語音訊號的適用性。

並列摘要


In Tradition, audio event classification relies heavily on MFCCs (Mel-Frequency Cepstral Coefficients) features. However, MFCCs is originally designed for automatic speech recognition. It is not sure whether MFCCs are still the best features for audio event classification or not. Besides, MFCCs are usually not so robust in noisy environment. Therefore, in this paper, several new feature extraction methods are proposed in the hope of getting better performance and robustness than MFCCs in noisy conditions. The proposed feature extraction methods are mainly based on the concept of match filters in spectro-temporal domain. Several methods to design the set of match filters are proposed including handmade gabor filters and three data-driven filters using PCA (Principle Component Analysis), LDA-based Eigen-space analysis (Linear Discriminative Analysis) and MCE (Minimum Classification Error) training. The robustness of the proposed method is evaluated on RWCP (Real World Computing Partnership) database with artificially added noise. There are 105 different audio events in RWCP. The experimental settings are similar to Aurora 2 multi-condition training task. Experimental results show that the lowest average error rate of 3.17% was achieved by MCE method and is superior to conventional MFCCs (4.13%). We thus confirm the superiority and robustness of the proposed audio feature extraction approaches.

參考文獻


[1] L. Kennedy and D. Ellis, “Laughter detection in meetings,” in NIST ICASSP Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 118-121
[5] C. Cleval, T. Ehrette, and G. Richard, “Events detection for an audio-based surveillance system,” in Proc. ICME’05, Orsay, France, July 2005, pp. 1306-1309
[6] R. Radhakrishnan, A. Divakaran, and A. Smaragdis, “Audio analysis for surveillance applications,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2005, pp. 158-161
[7] Z. Xiong, R. Radhakrishnan, A. Divakaran, and T. S. Huang, “Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework,” in ICME’03, Baltimore, USA, July 2003, vol. 3, pp. 401-404
[8] M. Slaney, “Mixtures of probability experts for audio retrieval and indexing,” in ICME’02, Ischia, Italy, July 2002, vol. 1, pp. 345-348

延伸閱讀