透過您的圖書館登入
IP:3.141.30.162
  • 期刊

適用於個人化車載資訊播報系統之語者調適語音合成技術

Speaker Adaptive Speech Synthesis Technology for Personalized In-vehicle Information Broadcasting System

摘要


將語音合成技術推廣應用於個人化車載資訊播報系統,一個主要的發展重點是,如何有效率的收集錄音語料進行語者調適。在本文中,我們提出了兩種基於貪婪演算法做挑選句子的方式。其一是音素涵蓋法,另一個則是模型涵蓋法。前者考慮調適語料的音節資訊,而後者考量出現在平均語言模型中的Mel-cepstral和logF0模型的次數。為了驗證方法的可行性,我們在主觀和客觀的評量上和隨機挑選法做比較。客觀評量的實驗結果指出,用模型涵蓋法所合成的語音有較少的Mel-cepstral失真度以及較低的logF0均方根誤差。主觀評量的實驗結果指出,音素及模型涵蓋方式明顯優於隨機挑選法。

並列摘要


The main focus of personalized speech synthesis technology applied to the in-vehicle information broadcasting system is to how to efficiently collect the recording data for the use of speaker adaptation. In this paper, we present two sentence selection approaches based on the greedy algorithm, one is the phone coverage based and the other is model coverage based. The former considers the phonetic information of adaptation data and the latter focuses on the occurrences of Mel-cepstral and logF0 models in decision trees of the average voice model. To verify the feasibility of the proposed methods, we compare the results with the random selection in objective and subjective evaluations. The objective evaluation results show that the model coverage based approach can generate synthetic speech with fewer Mel-cepstral distortions and lower RMSE logF0. The subjective evaluation results indicate that the phone/model coverage approaches are certainly beneficial as compared with random selection.

延伸閱讀