結合聲學與韻律訊息之強健性語者辨認

語者辨認系統在公共電話網路中，通常會遇到話筒不匹配和辨認語料不足的問題。為增進語者辨認系統之強健性，我們提出一融合下層聲學與上層韻律訊息之架構，首先利用(1)最大相似先驗知識內插法(maximum likelihood-a priori knowledge interpolation, ML-AKI)方法做話筒聲學特性估計與補償，並以(2)最小錯誤鑑別式法則(Minimum Classification Error, MCE)訓練語者模型，拉大不同語者模型間分數的距離，以得到更精確的語者模型，與利用(3)韻律訊息特徵分析(eigen-prosody analysis, EPA)為輔助，將所有語者投影至緊密的特徵韻律訊息空間，量測語者間的距離，最後利用(4)線性迴歸的方式融合聲學與韻律模型分數得到辨識的結果。實驗部份使用Linguistic Data Consortium(LDC)之HTIMIT語料庫(共有十種不同的話筒)，以leave-one-out方式驗證所提出之方法，若用傳統MAP-GMM/CMS的方法當作baseline，平均語者辨認率為60.2%。但若結合ML-AKI,MCE,EPA與MAP-GMM/CMS的方法，則平均辨認率可達到79.3%。而若只觀察未知話筒部份，平均語者辨識率亦可由58.3%提升至74.6%。由以上結果得知結合ML-AK，MCE/GPD，EPA和MAP-GMM方法和傳統MAP-GMM/CMS方法比較，無論對已知話筒和未知話筒環境皆能達到有效改善。

關鍵字

最大相似先驗知識內插法；最小錯誤鑑別式；韻律特徵分析；韻律訊息；聲學訊息

並列摘要

Unseen handset mismatch is the major source of performance degradation for close-set speaker identification in telecommunication environment.To compensate the handset mismatch problems with few available train/test data, a maximum likelihood a priori knowledge interpolation (ML-AKI) and an eigen-prosodic analysis (EPA) approaches were proposed and fused together for robust speaker indentification. The experimental results on HTIMIT showed that the ML-AKI+EPA+MCE+MAP-GMM/CMS fusion approach achieved 79.3% average speaker identification accuracy, which is much better than the traditional MAP-GMM/CMS-based baseline (60.2%). Moreover, the average speaker identification rates of the nine unseen handset turns in the level-one-out experiment could also be increased from 58.3% (MAP-GMM/CMS) to 74.6% (ML-AKI+EPA+MCE+MAP-GMM/CMS), respectively. Therefore, the proposed ML-AKI and EPA fusion methode is a promising approach for robust speaker identification for dealing both seen and unseen handset distortion.

並列關鍵字

maximum likelihood a priori knowledge interpolation ； unseen handset ； eigen-prosody analysis

參考文獻

【2】 S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 254-272, Apr. 1981.

【3】 M. G. Rahim and B. H. Juang: ‘Signal bias removal by maximum likelihood est-imation for robust telephone speech recognition’, IEEE Trans. On Speech andAudio Processing, vol. 4, no. 1, pp. 19-30, Jan 1996.

【4】 D. A. Reynolds: ‘HTIMIT and LLHDB: Speech corpora for the study of handset transducer effects’, in Proc. ICASSP’97, vol. II, pp. 1535-1538, 1997.

【5】 D. Reyolds, T. Quatieri and R.Dunn, “Speaker Verification Using Adapted Gaus-sian Mixture Models,” Digital Signal Processing, vol. 10, pp. 19-41, January 2000.

【6】 R. Auckenthaler, M. Carey, and H. Lloyd-Thomas, “Score Normalization for Te- xt-Independent Speaker Verification Systems,“ Digital Signal Processing, vol. 10, pp. 42-54, January 2000.

被引用紀錄

李信廷（2006）。改善最小錯誤鑑別式之語者辨認方法〔碩士論文，國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0207200917340387

國際替代計量

結合聲學與韻律訊息之強健性語者辨認

全文下載

主題瀏覽