透過您的圖書館登入
IP:3.14.70.203
  • 學位論文

運用人聲的一致性於語者辨識

Speaker Identification by the Consistency of Human Voice

指導教授 : 陳永耀

摘要


在語者辨識的研究中,音色是最常作為語者的語音特徵。音色是人們辨別語者的主要聽覺特性,此特性隱藏在聲音波形的諧波成份裡,所以文獻上對於擷取語者的語音特徵,大部分著重於頻域上的特性。梅爾倒頻譜參數和線性預測倒頻譜參數是文獻上常見的特徵擷取,但其原來是用於語音辨認的參數,往往其參數會因為語音內容的不同而有所改變,而抑制了辨識的效果。因此,我們延續文獻[5]提出的人聲具有一致性的想法,發展出一種特徵擷取的方法,達到一語者不論說任何字皆有一致的特徵向量。 本論文中,分成兩個部份。第一,使用語者特徵存在於高頻頻帶的想法,增進了文獻[5]中對於描述兩語者音色差異的特徵向量的一致性。第二,修改文獻[5]的方法,探討個別語者音色的一致性。此方法為先使用發聲腔道模型來取得語音的頻率響應,再利用 22 階的多項式曲線擬合法來擬合其頻率響應。此時所得到的 23 個係數經由正規化後,視為一 23 維的特徵向量,發現此特徵向量也擁有一致性。最後,我們使用此特徵向量來實現語者辨識,可以得到不錯的辨識效能。

並列摘要


In the study of speaker identification, timbre is often used as the characteristics of speakers. Timbre is the primary auditory feature that human verify the identities of speakers, and it is hidden inside harmonic components of a sound wave. Therefore, most of extracting speaker’s speech characteristics focus on the feature of frequency domain in the literature. Mel-Frequency Cepstral Coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC) are common methods of feature extraction, but their original purpose is the parameters of speech recognition so that the parameters vary with speech content, and limits identification performance. Thus, this thesis extend the idea [5] of the consistency of human voice to develop a method of feature extraction, and the method can find consistent feature vectors no matter what a speaker says. In this thesis, it is divided into two parts. First, using the idea that speaker features exist in high frequency bands promotes the consistency of feature vectors of describing timbre difference of two speakers. Second, the method of literature [5] is modified to investigate the consistency of individual timbre. In the second part, we use vocal tract model to obtain frequency responses of speech, and then use 22-order polynomial curve fitting to fit the frequency responses. Subsequently, normalized 23 coefficients are considered a 23-dimensional feature vector, and find that the feature vectors also have consistency. Finally, this method of feature extraction is used to perform the speaker identification, and achieve a good performance.

參考文獻


[16] X. D. Huang, A. Acero, and H. W. Hon, “Speech –Spoken language Processing A Guide To Theory, Algorithm and System Development,” Prentice Hall, 2001.
[1] B. C. J. Moore, “An introduction to the psychology of hearing”, fifth edition Academic Press, 2003.
[2] S. Seneff, “A joint synchrony/mean-rate model of auditory speech processing,” Journal of Phonetics, vol. 16, pp. 55-76, 1988.
[3] P. Vary, R. Martin, “Digital Speech Transmission: Enhancement, Coding and Error concealment,” New York: John Wiley, 2006.
[4] L. R. Rabiner and B. Juang, “Fundamentals of speech recognition,” Prentice-Hall, Inc., 1993.

延伸閱讀