透過您的圖書館登入
IP:216.73.216.100
  • 學位論文

結合聲學與韻律訊息之強健性語者辨認

Combination of Acoustic and Prosodic Imformation for Robust Speaker Recognition

指導教授 : 廖元甫

摘要


語者辨認系統在公共電話網路中,通常會遇到話筒不匹配和辨認語料不足的問題。為增進語者辨認系統之強健性,我們提出一融合下層聲學與上層韻律訊息之架構,首先利用(1)最大相似先驗知識內插法(maximum likelihood-a priori knowledge interpolation, ML-AKI)方法做話筒聲學特性估計與補償,並以(2)最小錯誤鑑別式法則(Minimum Classification Error, MCE)訓練語者模型,拉大不同語者模型間分數的距離,以得到更精確的語者模型,與利用(3)韻律訊息特徵分析(eigen-prosody analysis, EPA)為輔助,將所有語者投影至緊密的特徵韻律訊息空間,量測語者間的距離,最後利用(4)線性迴歸的方式融合聲學與韻律模型分數得到辨識的結果。 實驗部份使用Linguistic Data Consortium(LDC)之HTIMIT語料庫(共有十種不同的話筒),以leave-one-out方式驗證所提出之方法,若用傳統MAP-GMM/CMS的方法當作baseline,平均語者辨認率為60.2%。但若結合ML-AKI,MCE,EPA與MAP-GMM/CMS的方法,則平均辨認率可達到79.3%。而若只觀察未知話筒部份,平均語者辨識率亦可由58.3%提升至74.6%。由以上結果得知結合ML-AK,MCE/GPD,EPA和MAP-GMM方法和傳統MAP-GMM/CMS方法比較,無論對已知話筒和未知話筒環境皆能達到有效改善。

並列摘要


Unseen handset mismatch is the major source of performance degradation for close-set speaker identification in telecommunication environment.To compensate the handset mismatch problems with few available train/test data, a maximum likelihood a priori knowledge interpolation (ML-AKI) and an eigen-prosodic analysis (EPA) approaches were proposed and fused together for robust speaker indentification. The experimental results on HTIMIT showed that the ML-AKI+EPA+MCE+MAP-GMM/CMS fusion approach achieved 79.3% average speaker identification accuracy, which is much better than the traditional MAP-GMM/CMS-based baseline (60.2%). Moreover, the average speaker identification rates of the nine unseen handset turns in the level-one-out experiment could also be increased from 58.3% (MAP-GMM/CMS) to 74.6% (ML-AKI+EPA+MCE+MAP-GMM/CMS), respectively. Therefore, the proposed ML-AKI and EPA fusion methode is a promising approach for robust speaker identification for dealing both seen and unseen handset distortion.

參考文獻


【2】 S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 254-272, Apr. 1981.
【3】 M. G. Rahim and B. H. Juang: ‘Signal bias removal by maximum likelihood est-imation for robust telephone speech recognition’, IEEE Trans. On Speech andAudio Processing, vol. 4, no. 1, pp. 19-30, Jan 1996.
【4】 D. A. Reynolds: ‘HTIMIT and LLHDB: Speech corpora for the study of handset transducer effects’, in Proc. ICASSP’97, vol. II, pp. 1535-1538, 1997.
【5】 D. Reyolds, T. Quatieri and R.Dunn, “Speaker Verification Using Adapted Gaus-sian Mixture Models,” Digital Signal Processing, vol. 10, pp. 19-41, January 2000.
【6】 R. Auckenthaler, M. Carey, and H. Lloyd-Thomas, “Score Normalization for Te- xt-Independent Speaker Verification Systems,“ Digital Signal Processing, vol. 10, pp. 42-54, January 2000.

被引用紀錄


李信廷(2006)。改善最小錯誤鑑別式之語者辨認方法〔碩士論文,國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0207200917340387

延伸閱讀