本論文提出基於詞性之華語斷詞方法來改善華語語音合成系統,選用詞性的原因有三點,分別為「前後詞性搭配通常具有一定的規則」和「每個字只會有幾種常見的詞性」,這兩點可解決斷詞中未知詞的問題,第三點為「詞性會影響破音字的念法」,這點可解決在華語語音合成中常見的破音字問題。 本論文主要是利用特製化隱藏式馬可夫模型(specialized hidden Markov model, Specialized HMM)來處理華語斷詞,特製化的過程為利用「詞性」擴充狀態符號,觀測符號則維持為原來的華語字元。由於本論文的華語斷詞是針對使用在華語語音合成,因此在斷詞的標準上和一般資訊處理上的斷詞不盡相同,會根據詞性規則在訓練之前將詞先做合併。實驗結果中證實各種斷詞法加上詞性會提升斷詞準確率。 華語斷詞另一個常見的問題,為歧義性的問題,為了要解決歧義性的問題,本論文將以詞性為基礎的特製化隱藏式馬可夫模型和長詞優先法隱藏式馬可夫模型(M-HMM)透過一些準則做結合,稱為選擇性特製化隱藏式馬可夫模型。選擇性特製化隱藏式馬可夫模型結合了以上兩種方法的優點,來解決未知詞和歧義性的問題,於實驗結果中證實可再度提升斷詞的準確率。
This thesis proposes a POS-based (part of speech) word segmentation method for improving the speech quality produced by a Mandarin Chinese Text-To-Speech (TTS) system. POS information is adopted in word segmentation due to the following three reasons. First, collocation of POS's usually follows a certain syntactic rules. Second, every Mandarin character is only categorized as a certain set of POS's. The above two phenomena can solve the unseen word problem for word segmentation. The third reason is that the pronunciation of polyphonic characters usually depends on characters' POS's. In this thesis, POS information is incorporated with specialized hidden Markov models (Specialized HMM). In this approach, POS is used to extend the state symbols while the observation symbols represent Mandarin characters as before. Since the word segmentation described in this thesis is designed for a Mandarin Chinese TTS system, words are segmented differently from those standards used in information processing. Hence, according to some observed POS rules, certain words are combined as one single word before training. Experimental results show that adding POS information can effectively improve the segmentation accuracy. Another frequently seen problem is the segmentation ambiguity problem. In order to solve this problem, we combine POS-based specialized HMMs and maximum matching HMMs (M-HMM), called selective specialized HMMs, in order to acquire the benefits and compensate the weakness of these two methods towards the unseen word problem and segmentation ambiguity problem. Experimental results show that the selective specialized HMMs can further improve the segmentation accuracy against the POS-based specialized HMMs.