華語發音評量與聲調辨識研究

本論文首先提出一套自動化華語發音學習的演算法及其雛型展示系統。本系統使用了隱藏式馬可夫模型(hidden Markov models)的強制對位來切割每一個音素，並計算對應之聲學模型對數機率，以進行以排名為基準的信心度計算。接著再把每一單音節的音高資料以高斯混合模型(Gaussian mixture models)來進行訓練，以便進行聲調辨識。我們也針對標準語句和測試語句計算了強度和節奏的相似度分數。以音素、聲調、強度、節奏的四個分數函數，都是以參數化函數來表示，而最後的總分數，則是由音素、聲調、強度、節奏等四個評分函數的線性組合來決定。由於整體分數牽涉到線性和非線性參數，我們使用了下坡式Simplex搜尋來微調這些參數以逼近人為主觀評分。實驗結果顯示，本系統的計算結果和人為主觀評鑑具有高度一致性。更進一步的，在此發音學習的研究中可以發現，聲調對聲調語言的發音與識別是基本而重要的，聲調的辨識正確與否很大地影響了發音評估的好壞，因此我們也在此提出改進聲調辨識的創新方法。在聲調的辨識研究中，前人的工作大多採用兩階段處理，先以聲學模型對句子以強迫對應方式切割出音節，再使用眾多分類方法如類神經網路、高斯混合模型、隱藏式馬可夫模型和支撐向量機(support vector machines)等等，對切割好的音節訓練聲調模型。然而，強迫對應並不保證有人為判斷般的精準音素邊界，使得聲調模型的效能可能因為有聲範圍的判斷不佳而降低。為降低此一問題的影響，我們提出了一套強健化的，以隱藏馬可夫模型為基礎的連續語音聲調辨識方法，稱之為TRUES (tone recognition using extended segments)。這套方法對整個語句取出AMDF (average magnitude difference function)時域特徵，再以動態程式最佳化的方法，擷取出整句連續而不中斷的音高特徵曲線。每個音節的音高曲線在左右進行延伸後，訓練出左右本文均相關的聲調模型，期使增加聲調有用特徵和模型鑑別性，並減少切音結果對聲調模型的衝擊。實驗結果指出，吾人提出的TRUES這套方法，在我們自行錄製的唐詩語料下，對2007年新提出的supratone model而言，在辨識率上相對減少了 49.13% 的錯誤；而在我們實際的測試中，supratone model甚至已比新近的相關研究來得好了。此令人振奮的結果顯示出我們所提TURES方法的強健性和效果，也展現了以動態程序為基礎, 吾人所建議的整句不中斷音高追蹤法的優點。

關鍵字

電腦輔助發音訓練；電腦輔助語言學習；語音辦識；聲調辨識；語音評估；高斯混合模型；華文；音素；強度；韻律；強迫對應；連續聲調辨認；區段延伸式聲調辨識；隱藏式馬可夫模型；前後文相關聲調模型； HASH(0x1cee9be0) ； HASH(0x1cee9c80)

並列摘要

This dissertation firstly presents the algorithms used in a prototypical software system for automatic pronunciation assessment of Mandarin Chinese. The system uses forced alignment of HMM (hidden Markov models) for identifying each syllable and the corresponding log probability for phoneme assessment, through a ranking-based confidence measure. The pitch vector of each syllable is then sent to a GMM (Gaussian mixture models) for tone recognition and assessment. We also compute the similarity of scores for intensity and rhythm between the target and test utterances. All the four scores for phoneme, tone, intensity, and rhythm are parametric functions with certain free parameters. The overall scoring function was then formulated as a linear combination of these four scoring functions of phoneme, tone, intensity, and rhythm. Since there are both linear and nonlinear parameters involved in the overall scoring function, we employ the downhill Simplex search to fine-tune these parameters in order to approximate the scoring results obtained from a human expert. The experimental results demonstrate that the system can give consistent scores that are close to those of a human’s subjective evaluation. Moreover, in the experimental results of pronunciation assessment, tone recognition has been a basic but important criterion for speech recognition/assessment of tonal languages, such as Mandarin Chinese. Most previously proposed approaches adopt a two-step approach where syllables within an utterance are identified via forced alignment first, and tone recognition using a variety of classifiers, such as neural networks, GMM, HMM, SVM (support vector machines), is then performed on each segmented syllable to predict its tone. However, forced alignment does not always generate accurate syllable boundaries, leading to unstable voiced-unvoiced detection and deteriorating performance in tone recognition. Aiming to alleviate this problem, we propose a robust approach called TRUES (tone recognition using extended segments) for HMM-based continuous tone recognition. The proposed approach extracts an unbroken pitch contour from a given utterance based on dynamic programming over time-domain acoustic features of AMDF (average magnitude difference function). The pitch contour of each syllable is then extended for tri-tone HMM modeling, such that the influence from inaccurate syllable boundaries is lessened. Our experimental results demonstrate that the proposed TRUES achieves 49.13% relative error rate reduction over that of the recently proposed supratone modeling, which is deemed the state-of-the-art of tone recognition that outperforms several previously proposed approaches. The encouraging improvement demonstrates the effectiveness and robustness of the proposed TRUES, as well as the corresponding pitch determination algorithm which produces unbroken pitch contours.

並列關鍵字

CAPT ； CALL ； speech recognition ； tone recognition ； speech assessment ； GMM ； Mandarin Chinese ； downhill Simplex method ； phoneme ； intensity ； rhythm ； forced alignment ； Continuous tone recognition ； extended segment for tone recognition ； HMM ； context-dependent tone modeling ； supratone modeling

參考文獻

[Chen and Wang 1995] CHEN S. H. AND WANG Y. R. 1995. “Tone recognition of continuous Mandarin speech based on neural networks,” IEEE Trans. Speech Audio Process, Vol. 3 No. 2, 146-150.

[Chen et al. 2004b] CHEN J. C., LO J. L. AND JANG J. S. R. 2004. “Computer assisted spoken English learning for Chinese in Taiwan,” in Proc. of International Symposium on Chinese Spoken Language Processing, 337-340.

[Jang et al. 1997] JANG J. S. R., SUN C. T. and MIZUTANI E. 1997. Neural-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence, Prentice Hall PTR, Upper Saddle River, New Jersey.

[Lee 1997] LEE L. S. 1997. "Voice Dictation of Mandarin Chinese," IEEE Signal Processing Magazine, Vol. 14, Issue 4, 63-101.

[Lin and Lee 2003] LIN W. Y. AND LEE L. S. 2003. “Improved tone recognition for fluent Mandarin speech based on new inter-syllabic features and robust pitch extraction,” in Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, 237-242

被引用紀錄

陳雅婷（2012）。使用擴展修剪演算法決定語音音週標記及在台語語音合成的應用〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2012.00053

黃貞倫（2012）。外國人華語音調發音偏誤偵測 – 電腦輔助語言學習初步實驗〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2012.00551

葉哲宇（2006）。以NIOSⅡ處理器為基礎的音樂檢索FPGA硬體電路之實現〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0712200716110808

國際替代計量

華語發音評量與聲調辨識研究

全文下載

主題瀏覽