透過您的圖書館登入
IP:18.217.228.35
  • 學位論文

應用隱藏式及揭露式馬可夫模型於音訊內容識別

Applying Hidden Markov Model and Observable Markov Model for Audio Content Identification

指導教授 : 簡福榮
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文探討隱藏式馬可夫模型(Hidden Markov Model)和揭露式馬可夫模型(Observable Markov Model)用於音訊訊號的識別,實驗中每個馬可夫模型的狀態是由一組高斯混合機率密度函數做為觀察音訊之分類並且使用了梅爾頻率倒頻譜係數(Mel-Frequency Cepstral Coefficients)特徵值來描述音訊。整個音訊內容識別架構分別由資料庫訓練階段,和識別階段兩個階段組成。實驗中音訊資料庫包含共分成12類,其中包含九種樂器獨奏、交響樂及男女聲演唱。本論文實驗的分類模型使用了高斯混合模型(Gaussian Mixture Model)、隱藏式馬可夫模型以及揭露式馬可夫模型以作為性能評比。實驗結果顯示,相較於隱藏式馬可夫模型與梅爾頻率倒頻譜係數的組合,揭露式馬可夫模型與梅爾頻率倒頻譜係數的組合可以執行的更快,而且即使在不同的失真情況,例如:切割、MP3壓縮、AAC壓縮、振幅失真、時間長度的改變下仍有優越且趨於隱藏式馬可夫模型的正確率。

並列摘要


In this thesis, both Hidden Markov Model and Observable Markov Model (OMM) are developed as the audio fingerprints for each audio signal. Each state of both Markov Models is classified by a set of gaussian mixture probabilities and the features Mel-Frequency Cepstral Coefficients (MFCC) are taken into consideration in the experiments. The framework consists of two phases, one is the database training phase and the other is the identification phase. The audio database used in the experiments is divided into 12 categories, including 9 kinds of musical instruments , symphony and males and females singing. Three classifiers that consist of Gaussian Mixture Model, Hidden Markov Model, and Observable Markov Model are investigated. The experimental results show that the OMM(MFCC) scheme can execute faster than the HMM(MFCC) and performs graceful degradation even when suffering various distortion, such as clipping , MP3 compression, AAC compression, amplitude modification, and time-scale modification, etc.

參考文獻


[3] J. R. Jang, H. R. Lee and C. H. Yeh, “Query by Tapping: A new paradigm for content-based music retrieval from acoustic input,” The Second IEEE Pacific-Rim Conference on Multimedia, pp.590-597, Beijing, China, 2001.
[4] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on Speech and Audio Processing, Vol. 10, No. 5, pp. 293-302 July 2002.
[9] A. Ramalingam and S. Krishnan, “Gaussian mixture modeling using short time fourier transform features for audio,” IEEE International Conference on Multimedia and Expo, pp. 1146-1149, July 2005.
[12] B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002.
[13] A-Ching Wang, Jhing-Fa Wang, Kuok Wai He and Cheng-Shu Hsu, “Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor,” International Joint Conference on Neural Networks, pp. 1731-1735, July 2006.

延伸閱讀