透過您的圖書館登入
IP:18.222.69.152
  • 學位論文

適用於華英雙語語音辨識之聲學單位合併方法

Merging Acoustic Models for Improving Mandarin-English Bilingual Speech Recognition

指導教授 : 張智星

摘要


本論文的應用情境是在儲存空間有限的汽車裝置上建置一個給台灣人使用的華英雙語語音辨識系統,預期在不使用語言辨識的情況下,有效縮減模型空間大小,並和單語辨識系統有相當的辨識效能。 本研究透過合併雙語間相似的聲學單位來縮減模型大小,透過華英雙語不同標音的整合方式,尋找華英雙語之間適合合併的聲學單位,並且實作各種聲學單位的合併方法來建構雙語系統。 除了進行聲學單位的合併之外,本論文以決策樹的方式進行狀態單位之合併,實驗結果顯示以決策樹建構的分類、合併原則,能以更細微的角度合併華英雙語之間相似的狀態,不僅能有效縮減模型空間,亦能增加模型的強健程度。以決策樹進行模型合併的實驗可以將模型大小縮減成原來的三分之一,並且擁有比基礎模型高出1.2%的雙語整體辨識效能。

並列摘要


The long-term goal of this research is to construct a Mandarin-English bilingual speech recognition system on devices mounted on automobiles with limited storage size. Thus, the purpose of this thesis is to effectively reduce the model size and to maintain considerable performance as a unilingual system without using language identification. In this thesis, similar acoustic models are merged to reduce the number of model parameters. Similar acoustic units between the two languages are found by analyzing different phonetic notations with either knowledge-driven or data-driven techniques. In addition to directly merging the two acoustic models, this thesis also proposes the use of decision trees to merge states of different HMMs (hidden Markov models). Experimental result shows that, merging the models in a finer level via decision trees not only effectively reduces the model size but also enhances robustness of the bilingual models. By comparing to the baseline models, the state mergence using decision trees can reduce model size to one third of the original one and achieve an improvement of 1.2% in correction rate of bilingual recognition.

參考文獻


【1】 Lawrence Rabiner, B.H Juang, Fundamentals of speech recognition, Prentice Hall, 1993
【4】 Dau-Cheng Lyu, Speaker Independent Acoustic Modeling for Large Vocabulary Bi-lingual Mandarin/Taiwanese Continuous Speech Recognition, CGU, 2001
【5】 Shengmin Yu, Shuwu Zhang, Bo Xu, ”Chinese-English bilingual phone modeling for cross-language speech recognition”, ICASSP, 2004
【7】 Ya-Chi Chuang, A Study on L1-assisted Personalized Recognition Networks for Pronunciation Error-Spotting in English Learning, NTHU, 2007
【8】 Ting-Wei Xu, An Initial Study on English Continuous Speech Recognition, NTNU, 2007

延伸閱讀