透過您的圖書館登入
IP:3.145.74.54
  • 學位論文

中英夾雜語音之階層式韻律架構建立與語音合成之應用

Prosody Hierarchy Construction for Mixed Chinese-English Spelling Speech and its Application to TTS

指導教授 : 陳信宏

摘要


本論文針對以中文文句為主體但內含英文字母之中英夾雜文句,透過語言參數和聲學參數間的關係,建立一個中英夾雜的韻律模型,並完成自動化的韻律標記。本研究所標記的韻律標記為停頓標記及韻律狀態,其中停頓標記表示韻律單元的邊界,而韻律狀態的序列表示上層韻律單元的變化。透過分析訓練出的模型參數,探討停頓標記、聲學參數、語言參數和上層韻律狀態的關係。由實驗結果顯示英文字母之上層韻律狀態是隨著整體中文語句的韻律變化而起伏,而停頓標記則是在code-switch處會有較強的韻律斷點。此外也發現到名詞片語的韻律層次結構和其語法結構有很高關聯性。 最後利用此模型提出兩種韻律產生方法,第一種為藉由停頓標記的預估,產生韻律層次的文脈相關資訊,透過HTS產生韻律參數,第二種則是應用前述的韻律模型直接預估韻律參數。由客觀評估的實驗結果顯示,第一種方法的確能改善傳統HTS所產生之韻律參數,第二種方法則是在音節長度預測有顯著的效果。而主觀評估的結果也顯示第一種方法在聽覺上有最佳的自然度表現,代表透過本研究所預估的停頓標記能抓到更自然的韻律節奏變化。

並列摘要


In this thesis, an unsupervised joint prosody labeling and modeling (PLM) method for mixed Chinese-English word spelling speech is proposed. It labels an unlabeled corpus with two types of prosodic tags (i.e., break type of inter-syllable juncture and prosodic state of syllable) and builds four prosodic models simultaneously. The break tags can be used to delimit prosodic constituents of a hierarchical prosody structure, and the prosodic state can be used to construct the prosodic feature patterns of prosodic constituents. The four prosodic models describe the relationships of acoustic prosodic features, prosodic tags of utterances, and the linguistic features of the associated texts. The experimental results showed that prosodic variation in English word spelling was influenced by both the prosodic state that describes underlying intonation and Chinese tone borrowing effect. Besides, the relationship between hierarchical noun phrase structure and corresponding break type was also analyzed. The analysis suggested that magnitude of the break type was highly correlated with syntactic hierarchy in a noun phrase. Lastly, we propose two prosody generation methods for mixed Chinese-English word spelling Text-to-Speech system (TTS) based on PLM. In the first method, a break predictor is constrcted by CART method. Then, the related linguistic features and the predicted break tags are used for HMM-based Text-to-Speech system (HTS) training. In the second method, PLM is directly used as a prosody generator. Experimental results confirmed that the proposed method one was superior to the conventional HTS that only use linguistic features both in objective and subjective tests. Besides, the proposed method two was significantly better than the conventional HTS method at syllable duration prediction. Therefore, we conclude that the proposed PLM method was successful in prosody labeling and modeling for constructing a mixed Chinese-English word spelling TTS.

並列關鍵字

speech synthesis prosody labeling

參考文獻


【10】 江振宇,“非監督式中文語音韻律標記及韻律模式”,國立交通大學博士論文,民國九十八年三月
【22】 吳仲耘,“應用韻律階層及動態參數之音高預測在基於HMM之中文語音合成器”,國立成功大學碩士論文,民國九十七年七月。
【3】 A. W. Black, and K. A. Lenzo, “Multilingual Text-to-Speech Synthesis,” Proc. of ICASSP, vol.3, pp.761-764, 2004.
【5】 Sin-Horng Chen, Shaw-Hwa Hwang, and Yih-Ru Wang, “An RNN-Based Prosodic Information Synthesizer for Mandarin Text-to-Speech,” IEEE Trans. Speech Audio Processing, vol.6, no.3, pp.226-239,1998.
【6】 Yi Zhang, Jianhua Tao, “Prosody Modification on Mixed-Language Speech Synthesis, ” Chinese Spoken Language Processing, 2008 ISCSLP

被引用紀錄


郭淑蓮(2014)。新北市公立學校人事室行政決定運作及其組織效能關係之研究〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2014.01146
陳偉群(2013)。我國政府工程採購之研究〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2013.00949
林惠茹(2012)。我國推動全面募兵制度可行性之研究〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2012.00800
梁均紘(2011)。新北市國民小學行政人員對爭議性教育政策之認知與執行策略研究-以「活化課程實驗方案」為例〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2011.00094
吳明修(2010)。台北縣國民小學閱讀教育政策行銷之研究〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2010.01312

延伸閱讀