中文語音聲調辨識的改良與錯誤分析

中文語音聲調辨識是音訊處理上重要的一門學問，影響中文語音聲調最直接的因素便是音高軌跡。本論文從定義音高軌跡的各項特徵開始，除了音高軌跡之外，中文語音聲調也受各地區的發聲特性及變調規則的影響而造成聲調上的變化。因此我們也加入聲調與前後聲調的相關特徵。嘗試利用多種的特徵參數將中文語音聲調模型化。在找出聲調相關的特徵參數之後，本論文採用兩個常見的分類器來進行聲調辨識。第一種分類器是基於高斯混合模型為基礎分別訓練出每種聲調模型，而第二種分類器則採用支撐向量機演算法，找出一組適當的超平面以進行聲調分類。此外，我們也加入特徵選取方法來降低資料維度並觀察聲調辨識率是否有顯著的變化。我們採用兩種語料庫進行實驗分析與驗證，分別為Corpus455語料庫(單人，男性)及唐詩語料庫(多人，2位女性及8位男性)的語音資料庫。實驗結果指出，對於Corpus455語料庫，其特徵維度由30維降至8維，使用高斯混合模型及支撐向量機的辨識率分別提高6.05%與1.70%；而對於唐詩語料庫，其特徵維度亦由30維降至10維，辨識率在使用高斯混合模型時略為降低0.61%，而使用支撐向量機辨識率則顯著提高，由62.05%改進至72.49%，提高10.43%。

關鍵字

中文聲調辨識

並列摘要

Mandarin is a tonal language, in which each syllable is assigned a tone (a total of five tone types). In general, the tonality of a Mandarin syllable is characterized by its corresponding pitch contour. In view of this, we adopt several acoustic features related to pitch information in this study. Besides, since tone is usually influenced by different pronunciations and the sandhi rules, we accordingly add inter-syllabic acoustic features. Once these features are available, we apply two popular classifiers, Gaussian mixture model (GMM) and support vector machine (SVM) to proceed with the tone recognition. In addition, we also try to use the sequential floating search method (SFSM) to perform feature selection. In this study, two datasets, Corpus455 and TangPoem, are used to conduct several experiments. The experimental results indicate that the number of dimensionality is reduced from 30 to 8 for Corpus455 database, whereas it is reduced from 30 to 10 for TangPoem while SFSM is adopted. The tone recognition rates of Corpus455 by using GMM+SFSM and SVM+SFSM are promoted about 6.05% and 1.70% respectively as compared with using GMM and SVM only. Similarly, the tone recognition rates of TangPoem are changed about -0.61% and 10.43%.

並列關鍵字

Tone Recognition ； Mandarin

參考文獻

【4】 C. Cortes, and V. Vapnik. Support-vector network. Machine Learning 20, pp. 273-297, 1995.

【5】 Sin-Horng Chen, and Yih-Ru Wang. “Tone recognition of continuous Mandarin speech based on neural networks”. Proc. of International Symposium on Artificial Neural Networks, pp. F01-F10, 1993.

【7】 Wan-Yi Lin, and Lin-Shan Lee , “Improved tone recognition for fluent Mandarin speech based on new inter-syllabic features and robust pitch extraction”. IEEE 8th Automatic Speech Recognition and Understanding Workshop, PP.237-242.

【8】 P. Pudil, J. Navovicova, and J. Kittler, “Floating search methods in feature selection”, Pattern Recognition Letters, vol. 15 , pp 1119-1125, 1994.

【12】 Press, William H., Numerical Recipes in C, The Art of Scientific Computing, Cambridge University Press, 1992.

被引用紀錄

李宛穎（2011）。使用音高資訊以改進華語發音評量〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2011.00051

凌生銓（2013）。員工健康促進減重班成效探討─ 以北部地區某市立區域教學醫院為例〔碩士論文，臺北醫學大學〕。華藝線上圖書館。https://doi.org/10.6831/TMU.2013.2013.00160

鍾凱婷（2001）。運動介入計畫對大學生健康體能與相關變項之影響研究〔碩士論文，臺北醫學大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0007-1704200714481902

曾茂山（2002）。社區介入策略對國中生運動行為及體適能之影響研究以柑園國中為例〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1904200715573825

李香珍（2011）。青少年參與籃球運動專業化程度與其體適能表現之關係〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215471116

國際替代計量

中文語音聲調辨識的改良與錯誤分析

全文下載

主題瀏覽