以韻律輔助之中文語音辨認系統之實現

本研究提出一套新的整合韻律資訊於中文大辭彙連續語音辨認之方法。有別於以往只利用少數韻律資訊來幫助語音辨認，本研究利用先前已開發出的PLM演算法從大量未經人工標記的語料庫中自動產生訓練出12種韻律模型，並將其加入到two-stage自動語音辨認系統中，對系統中第一個stage，也就是傳統HMM辨認器所產生的詞圖(word lattice)作重新評分的動作，如此可以得到更正確的詞辨認序列；此外，系統第二個stage還會同時解碼出更多資訊，包含詞性(POS)、詞後所接的標點符號(PM)以及用來建構測試語料之階層式韻律架構的兩種韻律標記。本研究實驗語料是利用包含朗讀式長句之TCC300語料庫，同時實驗中會引入一個factored語言模型，它是一個描繪詞、詞性及標點符號三者之間關係的模型，用以產生更好的baseline辨認效能。本研究在加入所有韻律資訊後之實驗結果對於詞(word)、字(character)、音節(syllable)的錯誤率分別為20.1%、13.6%及9.4%，與baseline結果比較起來則分別改善了4.1%、4.0%及2.4%的絕對錯誤率(16.9%、22.6%及20.6%的相對錯誤率)。經由實驗結果分析，可以發現本系統能成功修正許多聲調及詞的錯誤辨認。

關鍵字

韻律輔助之自動語音辨認；韻律模式化；階層式韻律模型

並列摘要

This thesis presents a new prosody-assisted ASR system for Mandarin speech. It differs from the conventional approach of using simple prosodic cues on employing a sophisticated prosody modeling approach to automatically generate 12 prosodic models from a large unlabeled speech database by the PLM algorithm proposed previously. By incorporating these 12 prosodic models into a two-stage ASR system to rescore the word lattice generated in the first stage by the conventional HMM recognizer, we can obtain a better recognized word string. Besides, some other information can also be decoded, including POS, PM, and two types of prosodic tags which can be used to construct the prosody hierarchical structure of the testing speech. Experimental results on the TCC300 database, which consists of long paragraphic utterances, showed that the proposed system significantly outperformed the baseline scheme using a factored LM to model word, POS, and PM. Performances of 20.1%, 13.6%, and 9.4% in word, character, and base-syllable error rates were obtained, which corresponds to 4.1%, 4.0%, and 2.4% absolute (16.9%, 22.6%, and 20.6% relative) error reductions. By error analysis, we found that many word segmentation errors and tone recognition errors were corrected.

並列關鍵字

Prosody-assisted ASR ； Prosody modeling ； Prosody-hierarchy model

參考文獻

【32】張皓翔, “使用階層式韻律模型於豐富中文語音辨認”, 國立交通大學碩士論文, 民國九十九年八月。

【31】周建邦, “中文大詞彙語音辨認知語言模型改進”, 國立交通大學碩士論文, 民國九十八年十二月。

【25】 C.-R. Huang, K.-J. Chen, F.-Y. Chen, Z.-M. Gao and K.-Y. Chen. 2000, “Sinica treebank: design criteria, annotation guidelines, and on-line interface”, in Proceedings of 2nd Chinese Language Processing Workshop 2000, Hong Kong, pp. 29-37.

【1】 S. Ananthakrishnan and S. Narayanan, “Unsupervised adaptation of categorical prosody models for prosody labeling and speech recognition,” IEEE Trans. on Audio, Speech and Language Processing, vol. 17, no. 1, pp. 138-149, Jan. 2009.

【2】 S. Ananthakrishnan and S. Narayanan, “Improved speech recognition using acoustic and lexical correlates of pitch accent in a n-best rescoring framework,” in Proc.of ICASSP 2007, pp. IV-873-IV876.

被引用紀錄

吳孟謙（2015）。以韻律訊息輔助中文自發性語音辨認之改進〔碩士論文，國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2015.00004

邱子軒（2012）。使用韻律訊息於建立聲學模型之中文語音辨認〔碩士論文，國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2012.00841

國際替代計量

以韻律輔助之中文語音辨認系統之實現

全文下載

主題瀏覽