基於聲紋之未知音樂檔案辨識與分群研究

隨著網際網路的快速發展，音樂的傳播日益無遠弗屆。當一首歌曲經過了無數次的轉錄、傳輸以及下載後，常已與原聲CD之訊號差異甚大，檔名或許也已經不可考。如何利用某音樂片段本身聲紋的特徵來找出資料庫中相同出處的音樂文件，乃是一項實用但深具挑戰的問題。本論文利用統計參數模型化方法高斯混合模型(Gaussian mixture models, GMM)，來分析比對訊號的異同，並探討幾種在音樂聲紋辨識方面，常用且有效的聲學特徵參數，包括梅爾頻率倒頻譜係數(Mel-frequency cepstral coefficients, MFCC)、頻譜質心(Spectral Centroid, SC)和雷尼熵值(Renyi Entropy, RE)。我們成功地將兩種各具特色的特徵參數：梅爾頻率倒頻譜係數與頻譜質心進行整合，成為一稱作「梅爾頻譜質心倒頻譜係數」(Mel Spectral Centroid Cepstral Coefficients, MSCCC)之新的特徵參數。經由各種不同「訊號失真」版之參考樣本與測試樣本的交互比對實驗，證實MSCCC對於處理各種常見的失真強健性更佳。除了辨識音樂片段的來源以外，我們也探討未知歌曲的自動分群問題，期望在未經整理的資料庫中找出屬於相同來源的歌曲與不同來源的歌曲，分群加以區隔。我們先利用統計參數模型化方法與聲學特徵參數擷取進行音樂片段間之相似性量測，目標是希望對於出自同一首歌曲但歷經不同失真型態的音樂片段能彼此有較高的相似性，接著利用階層式分群方法將相似性高的音樂片段合為一群，相似性低的音樂片段分為不同群。實驗結果證實本論文所提出之方法的可行性。

關鍵字

音樂聲紋辨識；高斯混合模型；梅爾頻率倒頻譜係數；頻譜質心；梅爾頻譜質心倒頻譜係數；階層式分群方法

並列摘要

Supported by the rapid progress in computer and network technology, popular music is rapidly becoming one of the most prevalent data types carried by the Internet. As with digitization, a single song can be presented by numerous different-format and different-quality audio data, out of playing/recording, encoding/decoding, and transmission. As a result, how to identify a song based on a piece of “distorted” music has become a challenge research problem, usually termed audio fingerprinting. In this study, we propose a robust audio feature, called Mel Spectral Centroid Cepstral Coefficients (MSCCC), in conjunction with Gaussian mixture modeling technique to deal with this problem. After validating with various distorted music, such as sampling rate change, compression, and noise corruption, we show that the proposed MSCCC outperform the conventional features based on MFCC. In addition to the identification problem, this thesis also investigates if a collection of unknown music recordings can be partitioned into clusters, such that each cluster contains recordings from the same song. We develop a method to measure the similarities between music recordings and use the hierarchical agglomerative clustering to group together the recordings deemed similar to each other. Our experiments show that most of the music recordings from the same song can be grouped into a cluster.

並列關鍵字

Audio Fingerprinting ； Gaussian mixture model ； Mel Frequency Cepstral Coefficients ； Mel Spectral Centroid Cepstral Coefficients ； hierarchical agglomerative clustering

參考文獻

[1] X. Huang, A. Acero, and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.

[3] J. Haitsma, and T. Kalker, “A Highly Robust Audio Fingerprinting System,” Proc. 3rd Int. Symp. Music Information Retrieval, pp. 144-148, 2002.

[4] V. Venkatachalam, L. Cazzanti, N. Dhillon, and M. Wells, “Automatic Identification of Sound Recordings,”IEEE Signal Processing Magazine, pp. 92-99, March 2004.

[5] J. L. Hsu, C. C. Liu, A.L.P. Chen, "Discovering nontrivial repeating patterns in music data," IEEE Transactions on Multimedia, pp. 311-325, September 2001.

[6] W. Chai and B. Vercoe, "Music Thumbnailing via Structural Analysis," in Proc. ACM Multimedia Conference, November 2003.

被引用紀錄

薛宇志（2010）。依照鳥類鳴叫與鳴唱聲識別其種類〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2010.00179

國際替代計量

基於聲紋之未知音樂檔案辨識與分群研究

全文下載

主題瀏覽