透過您的圖書館登入
IP:18.224.33.107
  • 學位論文

鑑別式解碼應用於多重系統結合之中文大詞彙語音辨識

Discriminative Decoding on Multi-systems Combination for Improved Large Vocabulary Mandarin Speech Recognition

指導教授 : 李琳山

摘要


語音是人類最主要最方便的溝通方式之一。隨著科技發展,各種科技產品,如手機、個人數位助理(PDA)等逐漸充斥我們身邊,再加上無線通訊與無線網路的普及,一般公認為在不久的將來,語音將擔任新一代智慧型科技產品與人類之間溝通的主要介面。但是夠高的辨識正確率仍是任何應用的先決條件,而鑑別式解碼(Discriminative Decoding)與多重系統結合(Multi-Systems Combination)是目前兩個廣泛使用且證明能有效提昇辨識率的方法。 在本論文中,我們針對上述兩種方法進行一系列完整探討。在鑑別式解碼部分,我們研究了包括最小貝氏風險解碼(Minimum Bayes Risk Decoding, MBR)、區段最小貝氏風險解碼(Segment Minimum Bayes Risk Decoding, SMBR)、最小時間音框錯誤解碼(Minimum Time Frame Error Decoding, TFE)、與最佳貝氏分類解碼(Optimal Bayes Classification, OBC)之理論內容,並以中文大詞彙廣播新聞辨識為例進行完整實驗與探討。而在多重辨識系統結合部分,我們針對目前廣泛使用的辨識系統結果投票結合法(Recognizer Output Voting Error Reduction, ROVER)搭配單一最佳句、N最佳句、混淆網路(Confusion Network)做為輸入的演算法上進行探討,並以中文大詞彙廣播新聞為例進行完整實驗。 最後我們提出一個基於詞圖合併,鑑別式解碼技術可以成功應用的多重系統結合架構,使上述兩種技術可以有效密切整合。初步實驗結果顯示,在這個整合架構下,鑑別式解碼與多重系統結合可以相輔相成,獲得更佳的辨識率。這是因為由多重系統的詞圖合併可提供更全面的辨識假設空間(Hypothesis Space)使鑑別式解碼技術在風險估測上更為穩定與準確;而鑑別式解碼技術也付予多重系統結合可以選取出更正確的辨識結果的能力。

並列摘要


Substantial efforts have been made in various areas towards the goal of improving the performance of large vocabulary continuous speech recognition (LVCSR) technologies. Two important areas towards this goal, among many others, are rescoring over the word graph as well as combination of multiple systems. In this thesis, we focused on these two areas for complete studies. In the area of rescoring by discriminative decoding, we studied Minimum Bayes Risk decoding (MBR), Segment Minimum Bayes Risk decoding (SMBR) [16] , Minimum Time Frame Error decoding[17], and Optimal Bayes Classification decoding (OBC)[18] with experiments on Chinese broadcast news corpus. For combination of the outputs of several different systems, we focused on the ROVER technique with N-Best input[9][20]. A new concept of integrated hypothesis space for large vocabulary continuous speech recognition (LVCSR) systems combination is then proposed. Unlike the conventional systems combination approaches such as ROVER, the hypothesis spaces are directly integrated here without string alignment. In this way the timing information for all word hypotheses is well preserved and the new framework is more flexible on rescoring approaches used. Four different rescoring criteria on the integrated hypothesis space were further explored and experiments on Chinese broadcast news corpus indicated improved performance.

參考文獻


[1] M. J. F. Gales, B. Jia, X. Liu, K.C. Sim, P.C. Woodland and K. Yu, “Development of the CUHTK 2004 Mandarin Conversational Telephone Speech Transcription System,” in Proc. ICASSP, 2005
[3] D.Y. Kim, H.Y. Chan, G. Evermann, M.J.F. Gales, D. Mrva, K.C. Sim, P.C. Woodland, “Development of the CU-HTK 2004 Broadcast News Transcription Systems,” in Proc. ICASSP, 2005
[4] Nagendra Kumar, “Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition”, Ph.D. thesis, John Hopkins University, Baltimore, 1997
[5] Nagendra Kumar and A. G. Andreou, “Heteroscedastic Discriminant Analysis and Reduced Rank HMMs for Improved Speech Recognition”, Speech Communication, v.26 n.4, p.283-297, Dec. 1998
[6] Mark J. F. Gales, “Semi-tied Covariance Matrices for Hidden Markov Models”, IEEE Tr. SAP, 7(3), pages 272–281, 1999

被引用紀錄


朱忠玲(2007)。大字彙中文連續語音辨識之聲學模型及特徵正規化〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2007.01136
朱芳輝(2007)。資料選取方法於鑑別式聲學模型訓練之研究〔碩士論文,國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0204200815535282

延伸閱讀