  • 學位論文


Latent Prosody Model-based robust speaker verification

指導教授 : 廖元甫


本論文探討潛藏式韻律模型在中文語言中語者、聲調與韻律之間的關聯性,本論文探討在具有聲調特性的語言中,利用潛藏式韻律模型去表現不同的語者之間的差異性,再以此潛藏式韻律語者模型實現語者驗證。首先我們先利用自動語音辨認器來切割成音節單位並以正交基底多項式係數表示每個音節的軌跡,最後引進韻律狀態表示下層韻律分佈和狀態與狀態之間的轉移機率。最後使用每個語者的潛藏式韻律模型進行語者驗證的實驗。 實驗主要實現在ISCSLP2006-SRE語料庫。對特徵參數處理上使用MVA與usable,ISCSLP2006-SRE中EER從5.68%下降至5.04%。接著使用自動語音辨認器和音高軌跡輔助來提高辨認率,在ISCSLP2006-SRE中短程頻譜系統EER下降至4.84%,傳統長程韻律系統EER=38.37%。經過LPM架構中的硬式決策可以將長程韻律系統EER降至30.17%,在軟式決策中將長程韻律系統EER降至29.17%,在LPM語者的模型實驗中,將EER降至29.13%,最後我們將所有系統做分數上的整合,使ISCSLP2006-SRE中EER下降至4.00%。


This paper address the problem of the Chinese corpus prosody and tone correlation, and complementary between prosody and spectrum message for speaker verification and the efficiency through different systems. In LPM framework build a prosody state model and a tone model, and consider each other tone and prosody effect. Finally we use tone model and prosody state model to get optimum prosody state sequence and create prosody state bi-gram model. We use prosody state bi-gram model to assist spectrum speaker verification system performance. Our experiences are made on ISCSLP2006-SRE Corpus. We use MVA and usable to handle features. Compare with CMS, in ISCSLP2006-SRE EER 5.04%.Then we use auto speech recognizer and pitch contours to build speech boundary that to advance recognizer performance. In ISCSLP2006-SRE EER down to 4.83%.Finally we use the form LPM framework get prosody state bi-gram model to do score linear fusion. In ISCSLP2006-SRE EER down to 4.00%.


[2]. D. A. Reynolds et. Al., “The superSID project; exploiting highlevel information for high-accuracy speaker recognition, ”Proc.ICASSP’03, vol, IV,pp.784-787,2003.
[3]. Chen-Yu Chiang, Xiao-Dong Wang, Yuan-Fu Liao, Yih-Ru Wang, Sin-Horng Chen, Keikichi Hirose, “LATENT PROSODY MODEL OF CONTINUOUS MANDARIN SPEECH”, ICASSP’2007.
[6]. D. Reyolds, T. Quatieri and R.Dunn, “Speaker Verification Using Adapted Gaus-sian Mixture Models,” Digital Signal Processing, vol. 10, pp. 19-41, January 2000.
[7]. D. A. Reynolds, “Channel Robust Speaker Verification via Feature Mapping,” in Proc. ICASSP’03.
[8]. D. A. Reynolds et. al., “The superSID project: exploiting high-level information for high-accuracy speaker recognition,” Proc. ICASSP’03, vol. IV, pp.784-787, 2003.
