可變速中文文字轉語音系統

本論文描述以隱藏式馬可夫模型為基礎發展之「可變速中文文字轉語音系統」，訓練語料為三種不同語速之平行語料，分別對三種語速訓練文脈相關隱藏式馬可夫模型，並利用給予不同語速模型權重值來內插調整語速。另外，從語料庫觀察發現到慢速語音之靜音停頓較多而快速語音較少，傳統以標點符號位置決定靜音停頓的簡單方法，在用於可變速語音合成是不適當的，因此本研究加入預估靜音停頓之機制，對於不同語速分別訓練靜音停頓預估決策樹，再利用調整權重值內插不同語速停頓決策樹機率的方法，達到不同語速下靜音停頓的預估。為了評估本系統之效能，我們對系統進行客觀測試及主觀測試，在客觀測試中，評量靜音停頓預估之效能及量測合成語音和目標語音的誤差值；在主觀測試中，特別針對隱藏式馬可夫模型權重、靜音停頓決策樹權重以上兩組權重值的組合比較合成語音自然度，實驗結果顯示兩組權重值必須匹配才可合成出較自然的語音。期望以本論文提出方法建構之系統，較傳統單一語速之文字轉語音系統，更適合用於人機互動之中。

關鍵字

文字轉語音系統；中文韻律；語速；停頓預估

並列摘要

This paper presents an Hidden Markov Model (HMM)-based variable speech rate Mandarin Chinese text-to-speech (TTS) system. In this system, parameters of spectrum, fundametal frequency and state duration are generated by a context dependent HMM (CDHMM) whose model parameters are linear-interpolated from those of three CDHMMs trained by corpora in three different speech rates (SRs), i.e. fast, medium and slow. In addition, three decision tree (DT)-based pause break predictors trained by using the three SR corpora are used to interpolate the probabilities for inserting pause breaks. The performance of the proposed TTS system were evaluated by several objective and subjective tests. Experimental results suggested that coherence between interpolation weights for CDHMMs and DT-based pasue predictors is crutial for naturalness of the synthesis speech in variable SR. We believe that the proposed variable speech rate Mandarin Chinese TTS system is more suitable than conventional fixed SR TTS systems for applications of human-machine interaction.

並列關鍵字

Text-to-Speech System ； Mandarin Prosody ； Speech Rate ； Break Prediction

參考文獻

Chou, F.-C.,Tseng, C.-Y.,Lee, L.-S.(2002).A Set of Corpus-Based Text to Speech Synthesis Technologies for Mandarin Chinese.IEEE Trans. on Speech and Audoio Processing.10(7),481-494.

Google Scholar

Chiang, C.-Y,Tang, C.-C.,Yu, H.-M.,Wang, Y.-R.,Chen, S.-H.(2009).An Investigation on the Mandarin Prosody of a Parallel Multi-Speaking Rate Speech Corpus.Proc. of Oriental COCOSDA 2009.(Proc. of Oriental COCOSDA 2009).

Google Scholar

Huang, C.-R.,Chen, K.-J.,Chen, F.-Y.,Gao, Z.-M.,Chen, K.-Y.(2000).Sinica Treebank: Design criteria, annotation guidelines, and pn-line interface.Proc. of the Second Chinese Language Processing Workshop 2000.(Proc. of the Second Chinese Language Processing Workshop 2000).

Google Scholar

Zen, H.,Nose, T.,Yamagishi, J.,Sako, S.,Masuko, T.,Black, A. W.,Tokuda, K.(2007).The HMM-based speech synthesis system version 2.0..Proc. 6th ISCA Workshop Speech Synth..(Proc. 6th ISCA Workshop Speech Synth.).

Google Scholar

Imai, S.(1983).Cepstral analysis synthesis on the mel frequency scale.Proc. of ICASSP.(Proc. of ICASSP).

Google Scholar

被引用紀錄

邱子軒（2012）。使用韻律訊息於建立聲學模型之中文語音辨認〔碩士論文，國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2012.00841

謝宗佑（2008）。基於潛藏式韻律模型之強健性語者驗證〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2008.00376

劉于誠（2016）。扁平化文法架構之英文階層式韻律模型與其在語音合成之應用〔碩士論文，國立交通大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0030-0803201714404985

國際替代計量

可變速中文文字轉語音系統

全文下載

主題瀏覽