透過您的圖書館登入
IP:18.217.4.206
  • 學位論文

中文自發性語音之階層式韻律模型

Hierarchical Prosody Modeling for Mandarin Spontaneous Speech

指導教授 : 陳信宏

摘要


本論文針對國語自發性語音特性設計了一套非監督式階層韻律模型(HPM)的建立方法。我們在韻律上層將流暢與不流暢語音之韻律狀態結合,並在韻律底層的音節模型中加入了各種自發性語音現象的影響因子,如: contraction、lengthening、particle、marker等。這套方法成功在中研院現代漢語對話語音語料庫(MCDC)上完成驗證,並對所有語句進行停頓(break)以及韻律狀態(prosody state)之標記。接著,我們將此HPM模型參數與標記結果和過去朗讀式語音HPM進行比較,發現了許多重要的韻律現象差異。同時,我們也利用這些標記結果對常見的不流暢語音現象:語詞重複(repetition)與語詞修補(repair)進行了韻律分析,針對其不同的語用功能歸納了prosody marking現象。在結合HPM之自發性語音辨識實驗中,我們發現可分別在音節、字、聲調、詞的辨認上降低9.0%、9.2%、15.6%、及7.3%之錯誤。

並列摘要


In this dissertation, an unsupervised hierarchical prosody modeling (HPM) method for Mandarin spontaneous speech was firstly developed by extending our previous research on read speech. The prosodic states of fluent and disfluent speeches were combined at higher prosodic layer, while various affecting factors of spontaneous speech phenomena, such as syllable contraction, lengthening, types of particle and marker, were designed and added in prosodic-acoustic models at lower prosodic layer. The proposed method was successfully validated on MCDC database provided by Academia Sinica, and all syllables in the corpus were automatically annotated with break and prosodic state tags. Next, we compared the HPM model parameters and labeling results with their counterparts in another HPM trained from read speech. Many critical prosodic differences between spontaneous and read speeches were obtained. Meanwhile, these prosody labeling results of MCDC were also utilized to analyze two common types of disfluency: repetition and repair, and the prosodic marking behavior was summarized according to different pragmatic functions. Finally, an application of the HPM to assist in Mandarin spontaneous-speech recognition was conducted, and relative error rate reductions of 9.0%, 9.2%, 15.6%, and 7.3% were obtained for base-syllable, character, tone, and word recognition, respectively.

參考文獻


[1] C. W. Wightman and M. Ostendorf, “Automatic labeling of prosodic patterns,” IEEE Trans. Speech Audio Process. 2, 469–481 (1994).
[2] K. Chen, M. Hasegawa-Johnson, and A. Cohen, “An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model,” Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP I-509-512) (Fairmont Queen Elizabeth Hotel in Montreal, Quebec, Canada, 2004).
[3] S. Ananthakrishnan and S. Narayanan, “Data-driven unsupervised adaptation of acoustic-prosodic models,” Proceedings of the Speech Prosody Conference (Campinas, Brazil, 2008), pp. 161–164.
[4] V. K. R. Sridhar, S. Bangalore, and S. Narayanan, “Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework,” IEEE Trans. Audio Speech Lang. Process. 16, 797–811, (2008).
[5] S. Ananthakrishnan and S. Narayanan, “Unsupervised adaptation of categorical prosody models for prosody labeling and speech recognition,” IEEE Trans. Audio Speech Lang. Proc. 17, 138–149 (2009).

延伸閱讀