透過您的圖書館登入
IP:3.144.143.31
  • 學位論文

使用語音評分輔助台語語料的驗證

Using Speech Scoring for the Validation of Taiwanese Speech Corpus

指導教授 : 張智星 張俊盛
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


傳統上語料的整理需要耗費大量的人力和時間進行聽測,本論文的主要研究為使用語音辨識及結合語音評分,對未整理的台語語料進行初步的篩選。藉由機器先過濾掉有問題的音檔,如錄音音量過小、太多雜訊、錄音音檔內容有誤...等情形,取代傳統人工聽測費時的做法。本論文可分為三個階段,分別是:「基礎聲學模型訓練」、「語音評分與錯誤原因標記」及「效能評估」。 於基礎聲學模型訓練階段,以長庚大學實驗室提供的台語訓練語料,使用隱藏式馬可夫模型(Hidden Markov Model, HMM)進行聲學模型的訓練。聲學模型單位分別為:單音素聲學模型(Monophone acoustic model)、音節內右相關雙連音素聲學模型(Biphone acoustic model)及音節內左右相關三連音素聲學模型(Triphone acoustic model),其針對測試語料進行自由音節解碼辨識網路(Free syllable decoding)的音節辨識率(Syllable accuracy)最佳結果分別為:27.20%、43.28%、45.93%。 於語音評分與錯誤原因標記階段,將於基礎聲學模型訓練階段已訓練好的左右相關三連音素聲學模型對待整理的語料進行語音評分,而將其評分結果依照門檻值分為三部分,分別為低分區、中間值區及高分區。且針對低分區部分語料進行人工標記,標記其錯誤原因,再對其擷取特徵,使用支持向量機(Support Vector Machine, SVM)訓練出分類器,最後以該分類器對低分區語料進行二次檢驗,將低分區語料分為可用語料及不良語料。 於效能評估階段,將原先訓練語料分別加入「未整理語料」、「中間值區及高分區語料」、「高分區語料」進行聲學模型的訓練,比較篩選語料前、後效能,其音節辨識率結果分別為:40.22%、41.21%、44.35%。 由結果看來,經過篩選後語料所訓練出的聲學模型與未經篩選語料所產生的聲學模型,其辨識率的差別最高可達4.13%,證實本論文所提的方法,藉由語音評分確實能有效的自動篩選掉有問題的語句。 關鍵字:台語語料整理、隱藏式馬可夫模型、語音評分、語音辨識、支持向量機

並列摘要


Traditionally, preparing corpus needs a lot of labors and time for listening and selection. This research focuses on preparing a Taiwanese speech corpus by using speech recognition and assessment to automatically find the potentially problematic utterances. There are three main stages in this work: acoustic model training, speech assessment and error labeling, and performance evaluation. In the acoustic model training stage, we use the Taiwanese training dataset, provided by Chang Gung University (CGU), to train hidden Markov models (HMMs) as the acoustic models. Monophone, biphone (right context dependent), and triphone HMMs are tested. The recognition net is based on free syllable decoding. The best syllable accuracies of these three types of HMMs are 27.20%, 43.28%, and 45.93% respectively. In the speech assessment and error labeling stage, we use the trained triphone HMMs to assess the unprocessed dataset. And then we split the unprocessed dataset as low-scored dataset, mid-scored dataset, and high-score dataset by different thresholds. For the low-scored dataset, we identify and label the possible cause of having such a lower score. We then extract features from these lower-scored utterances and train an SVM classifier to further examine if each of these low-scored utterances is to be removed. In the performance evaluation stage, we evaluate the effectiveness of finding problematic utterances by using a joint dataset of the CGU training dataset and one of the following: the entire unprocessed dataset, both mid-scored and high-scored dataset, and high-scored dataset only. We use these three types of joint dataset to train and to evaluate the performance. The syllable accuracies of these three types of HMMs are 40.22%, 41.21%, 44.35% respectively. From the previous result, the disparity of syllable accuracy between the HMMs trained by unprocessed dataset and processed dataset can be 4.13%. Obviously, it proves that the processed dataset is less problematic than unprocessed dataset. We can use speech assessment automatically to find the potential problematic utterances. Keywords: Taiwanese corpus validation, Hidden Markov model, Speech assessment, Support vector machine.

參考文獻


【11】Ren-yuan Lyu, Min-siong Liang, Yuang-chin Chiang, Toward Construction A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin, International Journal of Computational Linguistics and Chinese Language Processing, 2004.
【4】Davis, Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE International Conference on Acoustics, 1980.
【9】黃武顯,基於32位元整數運算處理器之華語語音評分的改良與研究,民國96年。
【1】Ethnologue,Chinese Min Nan,http://www.ethnologue.com。
【2】廖子宇、呂仁園、高明達、江永進、張智星,台語文字與語音語料庫之建置,ROCLING,2012年。

延伸閱讀