透過您的圖書館登入
IP:3.147.44.121
  • 學位論文

國語語音辨識系統中之人名語言模型

The Personal Name Modeling in Mandarin ASR System

指導教授 : 王逸如 陳稷康

摘要


本論文主要有兩個目的:一是訓練一個高效能的中文語音辨識系統;二是改善因人名而造成的OOV(Out-Of-Vocabulary)問題,並將其辨認出來,以便日後自動轉寫不同類型的語音訊息並產生逐字稿。而人名之辨識對於將來自然語言處理也是一重要的訓練資料。 本論文使用Kaldi speech recognition toolkit的環境為基礎,在聲學模型的方面,本實驗使用多種類神經網路如CNN、LSTM、TDNN、DNN達到聲音資訊轉成音素序列(phone sequence)的目的;在語言模型方面,本論文透過加入中文特有的語言資訊如形音義詞的合併、專有名詞的拆解,並使用n-gram語言模型的訓練與lattice重打分(lattice rescoring),達到音素序列轉成文字序列(word sequence)的目的,並於解碼過程中調整參數與權重,找出最佳操作點,以得到即時性與辨識率兼顧的語音辨識系統,此外,針對以往人名無法辨認出來的問題,本論文建立特別的人名語言模型以類似class-based model的方式置換原word-based model中的人名,以達到辨識人名的目的。

並列摘要


There are two purposes in the paper, one is training an efficient ASR system, the other is improving the OOV problem caused by the personal name, and we want to recognize it for the purpose of making transcription of different kind of speech data. Name recognition data is also an important training data for the NLP. The paper base on the environment of Kaldi speech recognition toolkit. In the acoustic model part, we use many different kind of neural network such as CNN, LSTM, TDNN and DNN to transform the speech information into phone sequence. In the language part, we add Chinese special language information such as variant word combination and name entity decomposition, using n-gram language model and lattice rescoring to transform the phone sequence into word sequence. We also tune the parameters and weights during the decoding process to get the best operation point to obtain a ASR system which is not only good at recognition rate but also efficient at recognition time. Moreover, we focus on the problem of difficulty in personal name recognition. We build a class-based like model to replace the original word-based model of personal name to reach the goal of personal name recognition.

參考文獻


[1] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motl´ıcek, Y. Qian, P. Schwarz, J. Silovsky´, G Stemmer, K. Vesely´, "The Kaldi Speech Recognition Toolkit," in IEEE ASRU, 2011.
[2] S. Andreas, "SRILM — AN EXTENSIBLE LANGUAGE MODELING TOOLKIT," in icslp, Menlo Park, CA, U.S.A., 2002.
[3] L. R. Bahl, F. Jelinek, R. L. Mercer, "A maximum likelihood approach to continuous speech recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, pp. 179-190, Mar 1983.
[4] M. Mohri, F. Pereira, and M. Riley, "Speech recognition with weighted finite-state transducers," in Handbook of Speech Processing, J. Benesty, M. Sondhi and Y. Huang, Eds., Springer, 2008, pp. 559-582.
[5] M. Mohri, F. Pereira, and M. Riley, "Weighted finite state transducers in speech recognition," in Proc. Automatic Speech Recognition Workshop, 2000.

延伸閱讀