透過您的圖書館登入
IP:3.15.190.144
  • 學位論文

基頻於頻譜維度之語者辨識

Speaker Recognition Based on Spectral Dimension

指導教授 : 陳文雄

摘要


語者辨識是利用語音特徵來達到身份認證的目的。本論文提出一個結合梅爾倒頻譜係數和頻譜維度的語者辨識系統。而頻譜維度的基本概念源自於碎形維,其能擷取到語音頻譜非線性的特性,其實也就是找出頻譜和頻率間的關係,並採用最小平方距離來萃取頻譜維度。為了更近一步改善實驗效能,採用梅爾尺度方法對頻譜做切割,再萃取出其頻譜維度。樣型辨識方面則採用多重高斯混合模型。本論文也會對頻譜維度做相關的探討,並與其它較為簡單的頻譜特徵做個簡單的比較,我們將會在語者確認和識別實驗中呈現出改善的效能。 本實驗採用AURORA 2.0語音資料庫,其中包含52個男人和57個女人,每個人皆有長短不一77句的乾淨數字串語音檔。語者確認實驗將會分別針對男類別、女類別以及男女合在一起的類別作測試,在結合本論文的方法後可以發現在不同混合數的高斯混合模型皆能夠改善效能。以男語者確認實驗為例:在混合數為32的情況下,單獨只用12維頻譜維度可達到等錯誤率為4.3875%,而原本採用12維MFCC的等錯誤率為2.6906%。經過結合兩者特徵後,其等錯誤率降低到2.0968%,而改善率為21.91%。

並列摘要


Speaker recognition use speech features to obtain the goal of identity authentication. This thesis describes a speaker recognition that combines Mel Frequency Cepstral Coefficients (MFCC) with spectral dimension (SD). The basic concept of spectral dimension comes from fractal dimension. It can capture some non-linear spectral features which can describe the relationship of spectrum along frequency. Least-squared method (LSM) is used to extract spectral dimension. In order to improve the experimental performance, we also adopt Mel-scale method to allocate sub-band and multi-Gaussian mixture model (multi-GMM) to train pattern matching. Then we will discuss another properties related to spectral dimension. Besides, we will also compare with other simple spectral features. We will show the improved performance in our speaker verification and identification tasks. Our speaker recognition system is performed on the AURORA2.0 database which contains 52-males and 57-females and the content are 77 different length clean digital series. Our experiment is completed on 52-males, 57-females and all speakers. We observe that combining with our proposed method can improve the performance in different components. The experimental results show that the 52-males speaker verification system based on SD give an equal error rate (EER) of 4.3875% and the system based on MFCC give an EER of 2.6906% in 32-GMM. Combining both the proposed SD and MFCC features could obtain an EER of 2.0968%. The improvement rate is 21.91%.

參考文獻


[1] J. P. Campbell Jr., “Speaker recognition: a tutorial,” Proceedings of the IEEE, vol. 85, No. 9, pp. 1437-1462, 1997.
[2] P. Angkititrakul and J. H. L. Hansen, “Discriminative in-set/out-of-set speaker recognition,” IEEE Trans. on Audio, Speech, and Language, vol. 15, No. 2, pp. 498-508, 2007.
[3] F. Bimbot, J-F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega Garcia, D. Petrvovska-Delacretaz and D.A. Reynolds, “A tutorial on text-independent speaker verification,” EURASIP Journal on Applied Signal Processing, vol. 4, pp. 430-451, 2004.
[4] L. Burget, P. Matejka, P. Schwarz, O. Glembek, and J. Cernocky, “Analysis of feature extraction and channel compensation in a GMM speaker recognition system,” IEEE Trans. on Audio, Speech, and Language, vol. 15, No. 7, pp. 1979-1986, Sept. 2007.
[5] J. Ming, T. J. Hazen, J. R. Glass and D. A. Reynolds, “Robust speaker recognition in noisy conditions,” IEEE Trans. on Audio, Speech, and Language, vol. 15, No. 5, pp. 1711-1723, Jul. 2007.

被引用紀錄


劉佳格(2014)。語音密碼之辨識研究〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2014.10309
陳思伶(2007)。關係品質對顧客忠誠度之影響-以消費性電子產品為例〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2007.10443
洪滋蔓(2012)。在一對一服務情境心理性別對顧客滿意度之影響-以銀行理財顧問為例研究〔碩士論文,國立臺北大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0023-1402201223040400
林頎勛(2013)。銀髮族長青社區環境需求之研究〔碩士論文,朝陽科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0078-2712201314042163

延伸閱讀