以經正規化之字音聲紋圖進行語音比對

語音是重要的生物特徵之一，可用於身分辨識之用。本論文中提出一個將字音聲紋圖加以正規化的方法。藉著正規化，我們可以消去說話快慢、音量大小、甚至於錄音裝置頻率響應之差異對聲紋圖所造成的影響，讓聲紋圖能夠單純的反映出語者在音色上的差異。我們藉著計算出兩個聲紋圖或是其導出圖形之間的相關係數，進行兩個字音之間相似程度的評量。我們經由一個七十人規模的實驗，以本方法來進行語者驗證。當只使用一個句子時做比對時，約可得到95%的正確率。而當我們使用到七個句子時，達到了99%的正確率。

關鍵字

語者驗證；聲紋圖；正規化；共振峰；相關係數

並列摘要

Voice is one of the primary biometrics, commonly used to identify a person. In this paper we present a method to normalize the spectrogram of a sound in a speech. Through this normalization process, we are able to remove the differences between two pieces of voice samples due to factors such as speed of utterance, loudness, and the frequency responses of the recording devices. As a result, we expect a normalized spectrogram will reflect mainly the tonal characteristics of its speaker. We use the correlation coefficient of the normalized spectrograms (and their two derivatives) to reflect the tonal similarity of the two voices. In the experiment, we collected voice samples of 36 males and 34 females. We then used the proposed method to conduct speaker verification. When only one sentence (around 8 to 10 Chinese characters) was used, we were able to achieve 95% accuracy. When we increased the number of sentences to 7, the accuracy rates exceed 99% for all three coefficients.

並列關鍵字

speaker verification ； spectrogram ； normalization ； formant ； correlation coefficient

參考文獻

Bastys, A.(2010).The Use of Group Delay Features of Linear Prediction Model for Speaker Recognition.Informatica.21,1-12.

Google Scholar

Becker, T.(2008).Forensic Speaker Verification Using Formant Features and Gaussian Mixture Models.Proceedings of the Interspeech 2008 Special Session Forensic Speaker Recognition Traditional and Automatic Approaches.(Proceedings of the Interspeech 2008 Special Session Forensic Speaker Recognition Traditional and Automatic Approaches).: