  • 學位論文


An Implementation of HMM-based English Speech Synthesis

指導教授 : 陳信宏


本論文使用一個以中文為母語的女性語者,以托福考試文章為內容的語料庫,實作一個線上英文語音合成系統。先透過一個不錯的三連音模型為語料庫做切割,再使用cmu字典與Stanford-Postagger在標記中加上音素與音節、詞、片語、句子五層結構的相關位置的韻律資訊,加以建立口腔、基頻與狀態持續時間模型,以期增加合成語音的韻律、節奏的自然度。 由實驗結果顯示,產生的韻律仍不夠自然,雖和國外其它網站合成的語音比較起來,整體韻律起伏較為明顯一點,但聲音則明顯模糊不清與細部奇怪的音調起伏,推測是因為目前只使用規則法去估計各韻律標記,所預估的韻律資訊仍不夠準確,以致合成的音檔大體的韻律正確,但較細部的音調有忽高忽低的問題。




The thesis establishes an online English text to speech system. Using the data base based on a woman whose mother language is China read TOEFL article. First through a good tri-phone model to segment data base, then using CMU dictionary and Stanford-Postagger software labeled phone, syllable, word, phrase and sentence five level structure relative position and prosodic information, to establish vocal cave, fundamental frequency, and duration model, expected to product more prosody and rhythm. According to experiment result, the synthesized prosody still not natural enough. Although compare with speech synthesized from foreign web site, our prosody is more ripple but more blurred and weird rise and fall. Suppose to use rule based method to estimate variety prosodic labels still not accurate enough. So synthesized speech prosody right in general, but having strange ripple in detail.


English speech systhesis


[2] A.J. Hunt and A.W. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” Proc. ICASSP, pp.373-376, Atlanta, USA, May 1996.
[3] S. Nakajima and H. Hamada, “Automatic generation of synthesis units based on context oriented clustering,” Proc. ICASSP, pp.659-662, New York, USA, April 1988.
[5] T. Mizutani and T. Kagoshima, “Concatenative speech synthesis based on the plural unit selection and fusion method,” IEICE Trans. Inf. & Syst., vol.E88-D, no.11, pp.2565-2572, Nov. 2005.
[6] K. Tokuda, H. Zen, and A. W. Black, “An HMM-based speech synthesis system applied to English,” Proc. IEEE 2002 Workshop on speech Synthesis, Santa Monica, USA, Sept. 2002.
[10] T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, T. Kitamura, “Speaker interpolation for HMM-based speech synthesis system,” J. Acoust. Soc. Jpn. (E), vol.21, no.4, pp.199-206, 2000


