本論文探討語者驗證所求取韻律與頻譜訊息間之關聯以及韻律與頻譜訊息經由不同系統所得到的效能,該如何建立一個聯合模型使之同時考慮韻律與頻譜訊息之變化。我們考慮語者的頻譜與韻律訊息間的關連性,建立一ergodic隱藏式馬可夫聯合模型,以馬可夫模型的狀態與狀態間的轉移機率,描述上層語音韻律狀態的變化,並使用與韻律狀態相關的高斯混合模型,來描述下層頻譜訊息在某韻律狀態下的統計特性,以此聯合模型來描述韻律與頻譜訊息隨時間變化之情形。 實驗主要實現在NIST2001-SRE與ISCSLP2006-SRE語料庫。對特徵參數處理上使用MVA與usable,與頻譜平均值消去法(CMS)相比,在NIST2001-SRE中相等錯誤率(EER)從11.36%下降至8.64%,ISCSLP2006-SRE中EER從6.02%下降至5.04%。接著使用韻律狀態相關語者模型之方式結合短程與長程參數,在NIST2001-SRE中EER下降至8.28%,ISCSLP2006-SRE中EER下降至4.8%。最後再使用韻律狀態相關之隱藏式馬可夫模型對系統做整合,NIST2001-SRE中EER下降至8.19%,ISCSLP2006-SRE中EER下降至4.7%。
This paper address the problem of the relationship between prosody and spectrum message for speaker verification and the efficiency through different systems. How to make a joint-model that can satisfy the changes in prosody and in spectrum messages? We consider the relationship between speaker's prosody and spectrum and construct an ergodic hidden Markov model. And through the state's transtition probability in Markov model. We can describe the changes in upper speech prosody state. Then we use prosodic-state GMM to show the stastical characteristic in any one prosody state of the lower spectrum signal. Our experiences are made on NIST2001-SRE and ISCSLP2006-SRE Corpus. We use MVA and usable to handle features. Compare with CMS, in NIST2001-SRE the EER from 11.36% to 8.64%.In ISCSLP2006-SRE EER from 6.02% to 5.04%.Then we use prosody state speaker modeling and fusion short-term and long-term features,so in NIST2001-SRE EER down to 8.28%, In ISCSLP2006-SRE EER down to 4.8%.Finally we use the joint model. In NIST2001-SRE EER down to 8.19%, In ISCSLP2006-SRE EER down to 4.7%.