基於詞性之斷詞方法以改善華語語音合成系統

本論文提出基於詞性之華語斷詞方法來改善華語語音合成系統，選用詞性的原因有三點，分別為「前後詞性搭配通常具有一定的規則」和「每個字只會有幾種常見的詞性」，這兩點可解決斷詞中未知詞的問題，第三點為「詞性會影響破音字的念法」，這點可解決在華語語音合成中常見的破音字問題。本論文主要是利用特製化隱藏式馬可夫模型（specialized hidden Markov model, Specialized HMM）來處理華語斷詞，特製化的過程為利用「詞性」擴充狀態符號，觀測符號則維持為原來的華語字元。由於本論文的華語斷詞是針對使用在華語語音合成，因此在斷詞的標準上和一般資訊處理上的斷詞不盡相同，會根據詞性規則在訓練之前將詞先做合併。實驗結果中證實各種斷詞法加上詞性會提升斷詞準確率。華語斷詞另一個常見的問題，為歧義性的問題，為了要解決歧義性的問題，本論文將以詞性為基礎的特製化隱藏式馬可夫模型和長詞優先法隱藏式馬可夫模型（M-HMM）透過一些準則做結合，稱為選擇性特製化隱藏式馬可夫模型。選擇性特製化隱藏式馬可夫模型結合了以上兩種方法的優點，來解決未知詞和歧義性的問題，於實驗結果中證實可再度提升斷詞的準確率。

關鍵字

華語斷詞；華語語音合成系統；詞性；隱藏式馬可夫模型

並列摘要

This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, collocation of POS's usually follows a certain syntactic rules. Second, every Mandarin character is only categorized as a certain set of POS's. The above two phenomena can solve the unseen word problem for word segmentation. The third reason is that the pronunciation of polyphonic characters usually depends on characters' POS's. In this thesis, POS information is incorporated with specialized hidden Markov models (Specialized HMM). In this approach, POS is used to extend the state symbols while the observation symbols represent Mandarin characters as before. Since the word segmentation described in this thesis is designed for a Mandarin Chinese TTS system, words are segmented differently from those standards used in information processing. Hence, according to some observed POS rules, certain words are combined as one single word before training. Experimental results show that adding POS information can effectively improve the segmentation accuracy. Another frequently seen problem is the segmentation ambiguity problem. In order to solve this problem, we combine POS-based specialized HMMs and maximum matching HMMs (M-HMM), called selective specialized HMMs, in order to acquire the benefits and compensate the weakness of these two methods towards the unseen word problem and segmentation ambiguity problem. Experimental results show that the selective specialized HMMs can further improve the segmentation accuracy against the POS-based specialized HMMs.

並列關鍵字

無資料

參考文獻

【1】 Alan W. Black and Nick Campbell, “Optimising Selection of Units from Speech Databases for Concatenative Synthesis,” in Proc. of EUROSPEECH, pp.581–584, Sep. 1995.

【2】 E. Moulines, F. Charpentier, “Pitch Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis using Diphones”, Speech Communication 9 (5,6), pp. 453-467, 1990.

【3】 W. Verhelst, and M. Roelands, “An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech” Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, p.554-557, 1993.

【5】羅珝瑩，張智星，「根基於HMM之華語語音合成初步研究」，國立清華大學資訊工程學系碩士論文，民98年。

【6】 Satoshi Imai, “Cepstral Analysis Synthesis on the Mel Frequency Scale,” in Proc. of ICASSP, pp.93–96, 1983.

被引用紀錄

徐培霖（2012）。基於特徵替換法對語者調適語音合成之改進〔碩士論文，國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-2002201315383235

國際替代計量

基於詞性之斷詞方法以改善華語語音合成系統

全文下載

主題瀏覽