透過您的圖書館登入
IP:3.145.17.46
  • 學位論文

國語語音之發音變異分析及提昇辨識效能之發音模型

Pronunciation Variation Analysis and Modeling for Mandarin Chinese for Improved Speech Recognition

指導教授 : 李琳山

摘要


本論文主要分兩大部份,第一部份對國語語音的發音變異作深入的量化分析,第二部份是發展發音變異模型以提昇辨識效能。 在第一部份中,我們利用一些統計的方法來分析語音信號中的發音變異,這些統計方法包括新提出的聲學距離(Acoustic distance)和音素距離(Phonemic distance),及發音熵值(Entropy)、音韻規律(Phonological rules)等。第三章分析發音變異在次音節(聲韻母)、音節、字、詞這四種不同的語音單位的層次上,在不同的說話速率、不同詞頻以及不同的前後文關係下的平均熵值,來觀察發音變異出現的情形。第四章提出了新的分析架構,這個架構同時考慮聲學距離和音素距離,基於這個架構,我們分析了聲韻母或音素在聲學上的混淆度(聲學距離)以及在發音上的混淆度(音素距離)。為了更深入瞭解發音變異出現的時機,在第五章裡,我們從語音信號中自動取得統計式的音韻規律,並加以分析,藉以增加對中文發音變異的瞭解。這些是用中文大字彙語料來分析的,包括廣播新聞(LDC HUB-4NE)和對話語料 (LDC CALLHOME)。 此外,雖然讓詞典中的若干詞可以有多種發音來處理發音變異的確可增進語音辨識率,但是這些額外的發音也同時增加了辨識過程中詞的混淆度,因而限制了所能增進的辨識率。為了減少這些可能引起的混淆度,本論文在第二部份的第六章提出了自動建立發音變異模型的新架構,這個架構包括三個主要的步驟:發音變異資訊的產生、發音變異的排序、發音變異的選取。另外,我們也在第七章中提出了新方法來衡量詞典中各種發音的混淆度,實驗結果顯示衡量出來的混淆度跟語音辨識率有很大的相關性,而且這個架構所建立的發音變異詞典可以有效地降低混淆度和語音辨識的錯誤率。為了使混淆度降到最低,本論文也在第八章中提出了一個快速的鑑別式訓練架構,可以用來訓練詞典中的發音機率,這個架構利用一個模擬語音發生和辨識的整合模型來得到模擬的辨識錯誤資訊,使得訓練過程快速又有效。這些實驗是作在中文大字彙語音辨識上,用的語料包括廣播新聞(LDC HUB-4NE)和對話語料 (LDC CALLHOME)。

並列摘要


This thesis consists of two parts, one on pronunciation variation analysis and the other on pronunciation modeling, both for Mandarin Chinese. In the first part of the thesis, the pronunciation variation for Mandarin Chinese was extensively analyzed in a quantitative way. Various statistical methods were used for the analysis, including the proposed acoustic and phonemic distances in addition to pronunciation entropy and phonological rules. The pronunciation entropy were used to analyze the dependency of pronunciation variation at different linguistic levels on various contextual conditions, different speaking rates and different occurring frequencies. On the other hand, the proposed framework based on the acoustic/phonemic distances was used for analyzing the acoustic and phonemic confusion between Initial/Finals or phonemes. Furthermore, the probabilistic phonological rules were derived automatically from speech data to analyze the phonological transformation in various context conditions. All these analyses were carried out on planned (LDC HUB-4NE) and spontaneous (LDC CALLHOME) Mandarin Chinese speech corpora. On the other hand, multiple-pronunciation dictionaries have been found to be useful in pronunciation modeling for speech recognition. However, the extra pronunciation variants added in the dictionary inevitably increase the confusion among different words during recognition, and consequently limit the achievable improvements in the recognition performance. The second part of this thesis therefore further proposed a three-stage framework for Mandarin Chinese to construct automatically the multiple-pronunciation dictionary while reducing the possible confusion caused. The proposed framework includes pronunciation generation (Stage 1), ranking (Stage 2) and pruning (Stage 3). New measures of confusability for multiple-pronunciation dictionaries were developed and shown to have a very strong correlation with the recognition performance. With the proposed framework, it was shown that the confusability as measured can be reduced and recognition performance improved stage by stage. To further reduce the possible confusion during recognition, it was then proposed that the pronunciation probabilities in the multiple-pronunciation dictionaries can be re-estimated within a proposed rapid discriminative training framework using simulated recognition errors based on a Speech Production/Recognition Model. The experimental results show that the recognition performance can be improved over the training iterations. These findings were verified by a series of experiments performed on planned (LDC HUB-4NE) and spontaneous (LDC CALLHOME) Mandarin Chinese speech corpora.

參考文獻


respect to various linguistic levels and contextual conditions for Mandarin Chinese,”
and phonemic distance measures with application examples on Mandarin Chinese,”
[1] W. O’Grady, J. Archibald, M. Aronoff, and J. Rees-Miller, Contemporary Linguistics
- An Introduction, 4th ed. Bedford/St. Martin’s, 2001.
[2] E. Fosler-Lussier and N. Morgan, “Effects of speaking rate and word frequency

被引用紀錄


程永任(2008)。最小音素錯誤訓練法及其改進方法在國語大字彙辨識上之評估與分析〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2008.02662
許銘凱(2013)。自動判斷演唱歌詞正確與否之方法研究〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-0402201321402100

延伸閱讀