透過您的圖書館登入
IP:18.119.172.58
  • 學位論文

基於特徵替換法對語者調適語音合成之改進

On the Use of Speech Feature Substitution for Speaker Adaption within HMM-based TTS

指導教授 : 張智星
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文實作一線上語者調適及中文語音合成系統並提出特徵替換法用以改善合成語音。使用者在此系統中輸入欲合成文字,此系統會為該段文字進行斷詞、標聲調,以使用者選擇之聲學模型進行語音合成。 此系統也提供語者調適的功能,使用者在線上進行錄音,此系統依據文本及音檔進行語音評分,決定是否接受此語料。使用者錄製完畢後,系統後臺程式自動進行語者調適,訓練該使用者之聲學模型。 此外,本論文針對語者調適之合成語音,提出一個使用特徵替換的方法來改善其效果。這個方法使用真實語音片段的頻譜特徵,取代由聲學模型估計的頻譜特徵,藉此提升合成音檔與目標語者發音的相似度。在MOS評分中此方法較原始語者調適合成音檔的分數高了0.4分。

並列摘要


This study implements an online Mandarin speech synthesis system with speaker adaptation and proposes a speech feature substitution approach to improve the quality of the synthesized speech. The system takes texts provided by users as input and performs POS and tone tagging. The synthesis can be done with the acoustic models of users’ choices. This system also provides a speaker adaptation function. First, the user is asked to record a few sentences through a web interface. A speech scoring technique is used to validate the quality of the recorded utterances. The system then uses these utterances to perform speaker adaptation to adjust the acoustic models for speech synthesis. Moreover, this study proposes a speech feature substitution method to improve the quality of speaker adaptation. This method adopts the spectral features extracted from real speech utterances instead of estimating them from acoustic models. The similarity between the synthesized speech and target speech is therefore increased. The experimental result shows that the proposed method is able to improve upon the original method with an 0.4 increase in MOS score.

參考文獻


【12】 林政源,「應用於文字轉語音系統的語者調適方法回顧」, Vol.139, 電腦與通訊, 2011
【13】 唐若華,張智星,「基於詞性之斷詞方法以改善華語語音合成系統」,國立清華大學資訊工程學系碩士論文,2010。
【15】 吳尚鴻,王小川,「基於隱藏式馬可夫模型之中文語音合成與吼叫情緒轉換」,2010
【1】 F. C. Chou, C. Y. Tseng, and L. S. Lee, “A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese,” IEEE Trans. on Speech and Audio Processing, vol. 10, pp. 481–494, 2002.
【2】 A. Hunt and A. Black, “Unit selection in a concatenative speechsynthesis system using a large speech database” , ICASSP, pp. 373–376, 1996.

延伸閱讀