本研究主旨之目標是設計一個適用於DAISY數位有聲書的中英夾雜語音合成系統(Text-To-Speech,簡稱TTS),為了達成這個目標,我們首先設計一套雙語語料庫,並用Extended SAM Phonetic Alphabet(X-SAMPA)來統一中英雙語音素集(Phone Set)。然後分析文句擷取文脈相關訊息,考慮能提升文章自然度的相關文脈,包括語句階層以上的文脈、符號的判斷、語句是純中文、純英文,還是中英夾雜的文脈,並加入語意分析,設計出標註檔(Label)以及決策樹(Decision Tree)之問題集,最後合成。 從整體主觀評估來看,長篇合成文章閱讀可接受度評估為3.70分,整體自然度評估為3.19分,整體相似度評估為3.23分。細分自然度評估來看,中文自然度評估為3.47分、英文自然度評估為2.93分、中英夾雜自然度評估為3.17分。細分相似度評估來看,中文相似度評估為3.53分、英文相似度評估為2.92分、中英夾雜相似度評估為3.23分。最後,測試有無語意資訊的兩系統偏好測試,其偏好比為50%:50%,顯示語意影響不大。總結來說,我們的系統還可以被接受,但有很大的改善空間。
The goal of this study is to design a Mixed Chinese-English Speech Synthesis System for DAISY Digital Talking Books. In order to improve the quality of synthesyzed speech, especially for long paragraph or even the whole story, several keypoints are carefully considered including (1) design and collectation of a suitable bilingual corpus, (2) unification of the English and Chinese transcriptions using Extended SAM Phonetic Alphabet (X-SAMPA), (3) extraction of meanful linguistic and semantic cues beyond sentence level. From the results of a subjective listening assessment, the overall mean opinion scores (MOSs) of acceptability, naturalness and similarity are 3.70, 3.19 and 3.23, respectively. In detail, the naturalness scores of Chinese, English and mixed Chinese-English are 3.47, 2.93 and 3.17, respectively. On the other hand, the similarity scores of Chinese, English and mixed Chinese-English are 3.53, 2.92 and 3.23, respectively In comclusion, our sustem is acceptable, but there is still a lot of room for further improvement.