本論文的目的是實作即時性的中英文夾雜語音合成系統,從之前實驗室已有一版非即時性的中英文夾雜語音合成系統得到實作上的啟發,一開始先從斷詞器開始設計,目的是希望含蓋到中英文夾雜意思完全的詞,比如T恤、好Cool、很in、阿Sir等。在來設計字轉音的部分,首先設計一套雙語語料庫,利用Extended SAM Phonetic Alphabet(X-SAMPA)統一中英雙語音素集,透過查表分別轉出各自的音素在合併,再利用中英雙語辨認器做Forced Alignment取得音素切割資訊,最後由HTS訓練模型。合成時擷取相關文脈訊息由hts_engine讀取模型合出聲音。 從實驗結果來看,我們的系統與之前的非即時系統相比,在處理速度上確實得到提升,從偏好度來看也比之前的系統接受度高。再來與語者相比相似度和自然度,對於中文、英文、中英文夾雜的相似度分數分別為3.3、2.26、3.36。對於中文、英文、中英文夾雜的自然度分數分別為3.18、2.1、2.6。大致上符合我們的預期結果。
The goal of this study is to design a Mixed Chinese-English Speech Synthesis real time System. In order to improve the quality of synthesyzed speech, when we want to parse chinese sentences, we use Yih-Ru Wang's chinese parser instead of Stanford parser but we still use Stanford parser to parse english sentences. In order to improve the speed of synthesyzed speech, we use C language programming our real time system. From the results of a subjective listening assessment, the overall mean opinion scores (MOSs) of acceptability is 60%. In detail, the naturalness scores of Chinese, English and mixed Chinese-English are 3.18, 2.1 and 2.6, respectively. On the other hand, the similarity scores of Chinese, English and mixed Chinese-English are 3.3, 2.26 and 3.36, respectively. In comclusion, our sustem is acceptable, but there is still a lot of room for further improvement.