中文語音聲調辨識是音訊處理上重要的一門學問,影響中文語音聲調最直接的因素便是音高軌跡。本論文從定義音高軌跡的各項特徵開始,除了音高軌跡之外,中文語音聲調也受各地區的發聲特性及變調規則的影響而造成聲調上的變化。因此我們也加入聲調與前後聲調的相關特徵。嘗試利用多種的特徵參數將中文語音聲調模型化。 在找出聲調相關的特徵參數之後,本論文採用兩個常見的分類器來進行聲調辨識。第一種分類器是基於高斯混合模型為基礎分別訓練出每種聲調模型,而第二種分類器則採用支撐向量機演算法,找出一組適當的超平面以進行聲調分類。此外,我們也加入特徵選取方法來降低資料維度並觀察聲調辨識率是否有顯著的變化。 我們採用兩種語料庫進行實驗分析與驗證,分別為Corpus455語料庫(單人,男性)及唐詩語料庫(多人,2位女性及8位男性)的語音資料庫。實驗結果指出,對於Corpus455語料庫,其特徵維度由30維降至8維,使用高斯混合模型及支撐向量機的辨識率分別提高6.05%與1.70%;而對於唐詩語料庫,其特徵維度亦由30維降至10維,辨識率在使用高斯混合模型時略為降低0.61%,而使用支撐向量機辨識率則顯著提高,由62.05%改進至72.49%,提高10.43%。
Mandarin is a tonal language, in which each syllable is assigned a tone (a total of five tone types). In general, the tonality of a Mandarin syllable is characterized by its corresponding pitch contour. In view of this, we adopt several acoustic features related to pitch information in this study. Besides, since tone is usually influenced by different pronunciations and the sandhi rules, we accordingly add inter-syllabic acoustic features. Once these features are available, we apply two popular classifiers, Gaussian mixture model (GMM) and support vector machine (SVM) to proceed with the tone recognition. In addition, we also try to use the sequential floating search method (SFSM) to perform feature selection. In this study, two datasets, Corpus455 and TangPoem, are used to conduct several experiments. The experimental results indicate that the number of dimensionality is reduced from 30 to 8 for Corpus455 database, whereas it is reduced from 30 to 10 for TangPoem while SFSM is adopted. The tone recognition rates of Corpus455 by using GMM+SFSM and SVM+SFSM are promoted about 6.05% and 1.70% respectively as compared with using GMM and SVM only. Similarly, the tone recognition rates of TangPoem are changed about -0.61% and 10.43%.