學齡前中文兒童聲學模型之建立與相關研究

本論文宗旨為與國立台灣大學醫學院附設醫院新竹分院共同合作開發學齡前兒童構音障礙之聲學反饋輔助治療工具，目的為偵測學齡前兒童構音障礙之自動定量檢測系統，讓構音障礙兒童能夠在無語言治療師在場的情況下自主練習的教具。雖然過去已有許多兒童英文發音相關的文獻被發表，但在中文的部分卻相當有限，本論文實現了此計畫的第一步，建立適用於自動語音辨識系統的兒童聲學模型。本研究使用Kaldi speech recognition toolkit來實現語音辨識系統之建立，辨認方法採用包含卷積神經網路之深層類神經網路，並且在完全缺乏兒童訓練語料的情況下，提出在訓練端以聲道長度正規化為基礎的訊號增強方法，來增強由成人語料產生的聲學模型的強健性，解決在測試端因為兒童與成人語音特徵上不匹配的現象，實驗的結果顯示，本論文所提出的做法成功地降低了兒童語音辨認在測試語料的詞錯誤率。

關鍵字

中文；兒童語音辨認；聲學模型；深層類神經網路；卷積神經網路；構音障礙；聲道長度正規畫；訊號增強變換

並列摘要

This paper presents the results of a joint research project of National Chiao Tung University and the HsinChu Branch of National Taiwan University Hospital. The project aims to construct an automatic pronunciation error detection mechanism in a computer-aided language learning system for preschool children with developmental articulation disorders. Though many literatures regarding the articulation disorder for English speech were reported, studies for Mandarin speech were limited. Therefore, this thesis reports the first step of this project – construction of a robust acoustic model (AM) for children automatic speech recognition system (ASR). A CNN-based DNN is employed to serve as the recognizer. The system is realized by the Kaldi speech recognition toolkit. The study proposes a VTLN-based speech enhancement method to improve the robustness of the AM without any children speech data enrolled in the model training phase. Experimental results showed that the proposed method successfully reduces the word error rate significantly in the children ASR task.

並列關鍵字

Mandarin ； Children ASR ； acoustic model ； DNN ； CNN ； developmental articulation disorders ； VTLN ； data augmentation

參考文獻

[1] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motl´ıcek, Y. Qian, P. Schwarz, J. Silovsky´, G Stemmer, K. Vesely´, "The Kaldi speech recognition toolkit," in IEEE ASRU, December 2011

Google Scholar

[2] "Mandarin Microphone Speech Corpus-TCC300," [Online]. Available: http://www.aclclp.org.tw/use_mat_c.php#tcc300edu

Google Scholar

[3] "Librispeech ASR corpus," [Online], Available: http://www.openslr.org/index.html

Google Scholar

[4] X. Huang, A. Acero, H, Hon, "Spoken language processing: a guide to theory, alogrithm, and system development", Prentice Hall, PTR ,NJ, USA 2001

Google Scholar

[5] H. K. Vorperian, and R. D. Kent, "Vowel acoustic space development in children: A synthesis of acoustic and anatomic data," J. Speech Hear. Res., vol. 50, pp. 1510-1545, 2007.

Google Scholar

國際替代計量

學齡前中文兒童聲學模型之建立與相關研究

全文下載

主題瀏覽