中文自發性語音之韻律標記及韻律模式

韻律於許多語音處理研究中為相當有幫助的資訊，但前提是需要大量已標記的語料庫，並利用統計式的方式達成。由於語料庫的標記相當費時費力，特別是在自發性語音，所以當今已有韻律標記之中文自發性語音語料庫仍是寥寥無幾，因此本研究針對中央研究院所提供之「現代漢語口語對話語料庫」進行韻律之標記。本研究以自動的方式實現韻律標記，並且探討自發性語音之韻律變化。在此即利用語音信號中之韻律參數以及文字中之語言學參數，以非監督式的方法對此語料庫進行韻律標記，並訓練其韻律模型。本研究所標記的韻律標記為停頓標記及韻律狀態，其中停頓標記表示韻律單位的邊界，而韻律狀態的序列所代表的是上層韻律單位之變化。進而透過分析本研究所訓練出的模型參數，探討自發性語音中上層韻律變化的狀況，以及韻律標記、韻律參數、語言參數三者之間的關係，並且亦針對自發性語音中相較於朗讀式語音所沒有的一些特性進行分析。本研究發現，此語料庫經過自動標記以後，在自發性語音之韻律表現和特殊現象、以及詞語修補發現許多韻律變化現象，這些發現將可幫助未來進行自發性語音之相關研究提供更多有用的資訊。

關鍵字

自發性語音；韻律

並列摘要

In recent years, prosodic information are widely used in spontaneous speech processing. In those previous works, prosodic features are firstly extracted from speech corpus labeled with prosody tags or boundary types, and then prosodic models are built and used in the tasks. However, to prepare a large spontaneous speech corpus with prosody tags being properly labeled is in general a difficult task, performance of human labeling can not be guaranteed even if well-experienced annotators are involved. In this thesis, an investigation on the prosody of Mandarin spontaneous speech is conducted by using the unsupervised joint prosody labeling and modeling (PLM) method proposed previously for read speech. It labels an unlabeled spontaneous Mandarin speech corpus with two types of prosody tags, break type of inter-syllable juncture and prosodic state of syllable, and builds four prosodic models simultaneously. The break tag can be used to delimit prosodic constituents of a hierarchical prosody structure, and the prosodic state can be used to construct the prosodic feature patterns of prosodic constituents. The four prosodic models describe the relationships of acoustic prosodic features, prosody tags of utterances, and the linguistic features of the associated texts. Its effectiveness was confirmed by the experimental results on an unlabeled dialogue corpus, MCDC. Many meaningful characteristics of spontaneous-speech prosody were explored from the parameters of the well-trained prosodic models. The patterns of high-level prosodic constituents of a prosody hierarchy were derived. An analysis of disfluencies related to the labeling results was also discussed. Those results would be very advantageous in providing rich prosodic information for ASR.

並列關鍵字

Spontaneous Speech ； Prosody

參考文獻

【7】江振宇，“非監督式中文語音韻律標記及韻律模式”，國立交通大學博士論文，民國九十八年三月。

【1】S. Furui, “Recent progress in corpus-based spontaneous speech recognition,” IEICE Transactions on Informationnand Systems, Vol. E88-D, No. 3, pp. 366-375, 2005.

【2】Y. Liu, E. Shriberg, A. Stolcke, D. Hillard, M. Ostendorf, and M. Harper, “Enriching Speech Recognition with Automatic Detection of Sentence Boundaries and Disfluencies,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 14, No. 5, pp. 1526-1540, 2006.

【3】C. K. Lin, and L. S. Lee, “Improved Features and Models for Detecting Edit Disfluencies in Transcribing Spontaneous Mandarin Speech,” IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 7, pp. 1263-1278, 2009.

【8】M.Y. Tsai, F.C. Chou, and L.S. Lee, "Pronunciation modeling with reduced confusion for Mandarine Chinese using a three-stage framework," IEEE Transactions on Speech and Audio Processing, Vol. 15, No. 2, pp. 661-675, 2006.

被引用紀錄

游俊龍（2015）。中文自發性語音之聲學模式及韻律模式的改進〔碩士論文，國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2015.00714

許誌宏（2010）。中文自發性語音辨認系統〔碩士論文，國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2010.00890

國際替代計量

中文自發性語音之韻律標記及韻律模式

全文下載

主題瀏覽