本論文實作一線上語者調適及中文語音合成系統並提出特徵替換法用以改善合成語音。使用者在此系統中輸入欲合成文字,此系統會為該段文字進行斷詞、標聲調,以使用者選擇之聲學模型進行語音合成。 此系統也提供語者調適的功能,使用者在線上進行錄音,此系統依據文本及音檔進行語音評分,決定是否接受此語料。使用者錄製完畢後,系統後臺程式自動進行語者調適,訓練該使用者之聲學模型。 此外,本論文針對語者調適之合成語音,提出一個使用特徵替換的方法來改善其效果。這個方法使用真實語音片段的頻譜特徵,取代由聲學模型估計的頻譜特徵,藉此提升合成音檔與目標語者發音的相似度。在MOS評分中此方法較原始語者調適合成音檔的分數高了0.4分。
This study implements an online Mandarin speech synthesis system with speaker adaptation and proposes a speech feature substitution approach to improve the quality of the synthesized speech. The system takes texts provided by users as input and performs POS and tone tagging. The synthesis can be done with the acoustic models of users’ choices. This system also provides a speaker adaptation function. First, the user is asked to record a few sentences through a web interface. A speech scoring technique is used to validate the quality of the recorded utterances. The system then uses these utterances to perform speaker adaptation to adjust the acoustic models for speech synthesis. Moreover, this study proposes a speech feature substitution method to improve the quality of speaker adaptation. This method adopts the spectral features extracted from real speech utterances instead of estimating them from acoustic models. The similarity between the synthesized speech and target speech is therefore increased. The experimental result shows that the proposed method is able to improve upon the original method with an 0.4 increase in MOS score.