Translated Titles

An Initial Study on Language Model Estimation and Adaptation Techniques for Mandarin Large Vocabulary Continuous Speech Recognition



Key Words

語言模型 ; 語言模型調適 ; 主題混合模型 ; 最大熵值法 ; language model ; language model adaptation ; topic mixture model ; maximum entropy



Volume or Term/Year and Month of Publication


Academic Degree Category




Content Language


Chinese Abstract

在過去三十年間,統計式語言模型在各種與自然語言相關的應用上一直是一個重要的研究議題,它的功能是擷取自然語言中的各種資訊,諸如前後文資訊(contextual information)、語意資訊(semantic information)等,再利用這些資訊以機率量化來決定一個詞序列(word sequence)發生的可能性。例如,在語音辨識中,語言模型扮演的角色是要解決聲學混淆(acoustic confusion)的問題,將正確的辨識結果從有可能的候選詞序列中挑選出來。 近年來,語音辨識在我們生活中已有越來越多的應用,例如語音聽寫(voice dictation)、電話轉接(call routing)系統等等。但是語音辨識效能的好壞,通常會隨著辨識任務的詞彙或語意的不同,而受到嚴重的影響,於是誕生了語言模型調適的研究。語言模型調適是要利用辨識任務中固有的詞彙和語意資訊來彌補訓練語料與測試語料間的不一致性(mismatch)。 在本論文中,提出了原本應用在機率式資訊檢索上的主題混合模型法(topic mixture model, TMM)來動態的利用長距離的主題資訊,並且運用在語言模型調適上得到了不錯的效果。此外,本論文對最大熵值法(maximum entropy, ME)亦做了深入的研究,最大熵值法是一種將不同資訊來源(information sources)整合的方法,在此方法中,每一個資訊來源都會引發一群限制(constraints),限制合併後的語言模型要滿足所有的資訊。然而,這些限制的交集(intersection),是滿足所有資訊的機率分佈的集合,在這個集合中,擁有最大熵值(highest entropy)的機率分佈即為此方法的解。初步的實驗結果顯示以最大熵值法來合併一連詞、二連詞與三連詞所得到的語言模型,比用傳統最大相似度估測法(maximum likelihood)所訓練的語言模型,在中文廣播新聞轉寫上的字錯誤率(character error rate, CER)與語言模型複雜度(perplexity)都達到較好的效果。

English Abstract

Statistical language modeling, which aims to capture the regularities in human natural language and quantify the acceptance of a given word sequence, has continuously been an important research issue in a wide variety of applications of natural language processing (NLP) over the past three decades. For example, in speech recognition, the principal role of the language models is to help resolve the acoustic confusion and thus separate the correct hypothesis from the competing ones. In the recent past, there were quite many applications of speech recognition technology being developed, such as voice dictation and call routing systems, etc. However, speech recognition performance is often seriously affected by the varying lexical and semantic characteristics among different application tasks. Thus, there is always a need for language model adaptation, which has the goal to exploit the specific lexical and semantic information inherent in the recognition domain, so as to compensate the mismatch between training and testing conditions. In this thesis, a topical mixture model (TMM) previously proposed for probabilistic information retrieval was investigated to dynamically explore the long-span latent topical information for language model adaptation. Moreover, we also studied the use of the Maximum Entropy (ME) principle for language modeling. ME is a principle for efficient combination of a variety of information sources. Under the ME criterion, each information source gives rise to a set of constraints that can be futher imposed on the resultant language model. The intersection of these constraints is the set of language model probability distributions which can satisfy all of these constraints. The probability distribution which has highest entropy is thus the solution of the ME principle. The preliminary experimental results show that the ME-based language modeling approach can achieve superior performance over the conventional Maximum Likelihood (ML) based approach in both character error rate and perplexity reductions on the Mandarin broadcast news transcription task.

Topic Category 基礎與應用科學 > 資訊科學
理學院 > 資訊工程研究所
  1. [Aubert 2002] X. L. Aubert. “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, January 2002.
  2. [Ball et al. 1967] G. H. Ball, and D. J. Hall. A Clustering Technique for Summarizing Multivariate Data. Behavioral Science, Volume 12, pages 153-155, 1967.
  3. [Bellegarda 2000] J. R. Bellegarda. Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE, Volume 88, pages 1279-1296, August 2000.
  4. [Bellegarda 2004] J. R. Bellegarda. Statistical language model adaptation: review and perspectives. Speech Communication, 42, 2004.
  5. [Bellegarda 2005] J. R. Dellegarda. Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling. to appear in IEEE Signal Processing Magazine, September 2005.
  6. [Chang et al. 2003] P-C Chang and L-S Lee. Improved Language Model Adaptation Using Existing and Derived External Resources. In Proceedings of ASRU, pages 531-536, December, 2003.
  7. [Chen 2005] B. Chen. Exploring the Use of Latent Topical Information for Statistical Chinese Spoken Document Retrieval. Accepted for publication In Pattern Recognition Letters, 2005, (SCI Expanded, EI).
  8. [Chen et al. 2002] B. Chen, H-M Wang, and L-S Lee. Discriminating Capabilities of Syllable-Based Features and Approaches of Utilizing Then for Voice Retrieval of Speech Information in Mandarin Chinese. IEEE Trans. On Speech and Audio Processing, Volume 10 (5), pages 303-314, July 2002.
  9. [Chen et al. 2004a] B. Chen, J-W Kuo, and W-H Tsai. Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription. In Proceedings of ICASSP, Volume 1, pages 777-780, May 2004.
  10. [Chen et al. 2004c] B. Chen, W-H Tsai, and J-W Kuo. Statistical Language Model Adaptation for Mandarin Broadcast News Transcription. In Proceedings of ISCSLP04, pages 313-316.
  11. [Chen et al. 2005] B. Chen, Jen-Wei Kuo, Wen-Huang Tsai. “Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription,” International Journal of Computational Linguistics and Chinese Language Processing, Vol. 10, No. 1, pp.1-18, March 2005.
  12. [Chen et al. 1999] S. F. Chen, J. Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. Computer Speech and Language, 13, 1999.
  13. [Chen et al. 2003] L. Chen, J-L Gauvain, L. Lamel, and G. Adda. Unsupervised Language Model Adaptation for Broadcast News. In Proceedings of ICASSP, Volume 1, pages 220-223, April 2003.
  14. [Chueh et al. 2004] C-H Chueh, J-T Chien, and H-M Wang. A Maximum Entropy Approach for Integrating Semantic Information in Statistical Language Models. In Proceedings of ISCSLP04, pages 309-312.
  15. [Darroch et al. 1972] J. N. Darroch, D. Ratcliff. Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, Volume 43, pages 1470-1480, 1972.
  16. [Duda et al. 1973] R. O. Duda, and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley Sons.
  17. [Good 1963] I. J. Good. Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables. The Annals of Mathematical Statistics, Volume 34, No. 3, pages 911-934, September, 1963.
  18. [Jaynes 1957] E. T. Jaynes. Information Theory and Statistical Mechanics. Physics Reviews, Volume 106, no. 4, pages 620-630, 1957.
  19. [Katz 1987] S. M. Katz. Estimation of Probabilities from Sparse Data for the Language Model Component of A Speech Recognizer. IEEE Trans. On Acoustics, Speech and Signal Processing, Volume 35 (3), pages 400-401, March 1987.
  20. [Kuhn et al. 1990] R. Kuhn, and R. De Mori. A cache-based natural language model for speech recognition. IEEE Trans. On Pattern Analysis and Machine Intelligence, Volume 12, pages 570-582, June 1990.
  21. [Moriya et al. 2001] T. Moriya, K. Hirose, N. Minematsu, and H. Jiang. Enhanced MAP Adaptation of N-gram Language Models Using Indirect Correlation of Distant Words. In Proceedings of ASRU, pages 397-400, Italy, December 2001.
  22. [NIST] National Institute of Standards and Technology. http://www.nist.gov/ .
  23. [Ratnaparkhi 1997] A. Ratnaparkhi. A simple introduction to maximum entropy models for natural language processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania, 1997.
  24. [Rosenfeld 1996] R. Rosenfeld. A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language, Volume 10, pages 187-228, 1996.
  25. [Rosenfeld 2000] R. Rosenfeld. Two Decades of Statistical Language Modeling: Where Do We Go from Here. In Proceedings IEEE, Volume 88, no. 8, pages 1270-1278, 2000.
  26. [Valsan et al. 2003] Z. Valsan and M. Emele. Thematic Text Clustering for Domain Specific Language Model Adaptation. In Proceedings of ASRU, pages 513-518, December 2003.
  27. [Wang et al. 2005] Hsin-min Wang, Berlin Chen, Jen-Wei Kuo, and Shih-Sian Cheng. “MATBN: A Mandarin Chinese Broadcast News Corpus,” accepted to appear in International Journal of Computational Linguistics and Chinese Language Processing.
  28. [Bacchiani et al. 2003] M. Bacchiani and B. Roark. Unsupervised Language Model Adaptation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 2003.
  29. [Baeza-Yates et al. 1999] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman, 1999.
  30. [Berger et al. 1996] A. Berger, S. Della Pietra, and V. Della Pietra. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, 22 (1), pages 39-71, 1996.
  31. [Berger 1997] A. Berger. The Improved Iterative Scaling Algorithm: A gentle Introduction. December, 1997.
  32. [Chen et al. 2004b] B. Chen, J-W Kuo, Y-M Huang, and H-M Wang. Statistical Chinese Spoken Document Retrieval Using Latent Topical Information. In Proceedings of ICSLP Volume 11, pages 1621-1625, October 2004.
  33. [Chou et al. 2003] W. Chou (editor), B. H. Juang (editor). Pattern Recognition in Speech and Language Processing, CRC Press, 2003.
  34. [CNA news] Central News Agency news. http://www.cna.com.tw.
  35. [Della Pietra et al. 1992] S. Della Pietra, V. Della Pietra, R. Mercer, and S. Roukos. Adaptive Language Model Estimation Using Minimun Discrimination Estimation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages 633-636, April 1992.
  36. [Dempster et al. 1977] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, Volume 39, no. 1, pages 1-38, 1977.
  37. [Federico 1999] M. Federico. Efficient Language Model Adaptation Through MDI Estimation. In Proceedings of EUROSPEECH, Volume 4, pages 1583-1586, September 1999.
  38. [Gildea et al. 1999] D. Gildea, and T. Hofmann. Topic-Based Language Models Using EM. In Proceedings of EUROSPEECH, Volume 5, pages 2167-2170, September 1999.
  39. [Goodman 2001] J. Goodman. A Bit of Progress in Language Modeling Extended Version. Microsoft Research, Machine Learning and Applied Statistics Group, Technique Report, 2001.
  40. [Jelinek 1977] F. Jelinek, R. L. Mercer, L. R. Bahl, and J. K. Baker. Perplexity—a measure of difficulty of speech recognition tasks. 94th Meeting of the Acoustic Society of America, December 1977, Miami Beach, FL.
  41. [Jelinek 1991] F. Jelinek. Up from Trigrams! The Struggle for Improved Language Models. In Proceedings of EUROSPEECH, page 1037-1040, 1991.
  42. [Kim et al. 2004] W. Kim, and S. Khudanpur. Cross-Lingual Latent Semantic Analysis for Language Modeling. In Proceedings of ICASSP, Volume 1, pages 257-260, May 2004.
  43. [Kneser et al. 1997] R. Kneser, J. Peters, and D. Klakow. Language model adaptation using dynamic marginals. In Proceedings of EUROSPEECH, pages 1971-1974, Rhodes, Greece, 1997.
  44. [Kullback 1959] S. Kullback. Information Theory in Statistics. Wiley, New York, 1959.
  45. [Liu et al. 2003] X. Liu, and W. B. Croft. Statistical Language Modeling for Information Retrieval. To appear in the Annual Review of Information Science and Technology, Volume 39 (2005).
  46. [Miller et al. 1999] D. R. H. Miller, T. Leek, and R. Schwartz. A Hidden Markov Model Information Retrieval System. In Proceedings of ACM SIGIR Conference on R&D in Information Retrieval, pages 214-221, 1999.
  47. [Mori et al. 1999] R. De Mori, and M. Federico. Language model adaptation. In Computational Models of Speech Pattern Processing. K. Ponting, Ed., Volume 169 of F: Computer and Systems Sciences, pages 280-303, 1999.
  48. [Mrva et al. 2004] D. Mrva, and P. C. Woodland. A PLSA-based Language Model for Conversational Telephone Speech. In Proceedings of ICSLP, pages 2257-2260, October 2004.
  49. [Nanjo et al. 2003] H. Nanjo, and T. Kawahara. Unsupervised Language Model Adaptation for Lecture Speech Recognition. In Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, pages 75-78, 2003.
  50. [PTS] Public Television Service Foundation. http://www.pts.org.tw .
  51. [Sasaki et al. 2000] K. Sasaki, H. Jiang, and K. Hirose, Rapid Adaptation of N-gram Language Models Using Inter-word Correlation for Speech Recognition. In Proceedings of ICSLP, pages 508-511, Beijing, October 2000.
  52. [SLG] Spoken Language Group at Chinese Information Processing Laboratory, Institute of Information Science, Academia Sinica. http://sovideo.iis.sinica.edu.tw/SLG/index.htm .
  53. [SRILM] A. Stolcke. SRI Language Modeling Toolkit. version 1.3.3, http://www.speech.sri.com/projects/srilm/ .
  54. [郭人瑋 等 2004] 郭人瑋、蔡文鴻、陳柏琳. “非監督式學習於中文電視新聞自動轉寫之初步應用,” 第十六屆自然語言與語音處理研討會 (Proc. ROCLING XVI).
Times Cited
  1. 羅丞邑(2011)。以資料探勘之技術解決線上客語語音合成系統中多音字發音歧義之研究。中興大學資訊網路多媒體研究所學位論文。2011。1-87。 
  2. 廖振淵(2010)。利用粗糙集理論解決中文轉台語文轉音系統中一詞多音問題。中興大學資訊科學與工程學系所學位論文。2010。1-49。 
  3. 林金玉(2008)。中文轉台語文轉音系統中一詞多音之預測。中興大學資訊科學與工程學系所學位論文。2008。1-55。 
  4. 邱炫盛(2006)。利用主題與位置相關語言模型於中文連續語音辨識。臺灣師範大學資訊工程研究所學位論文。2006。1-147。
  5. 陳冠宇(2010)。主題模型於語音辨識使用之改進。臺灣師範大學資訊工程研究所學位論文。2010。1-175。
  6. 張文遠(2012)。自動體外除顫器之工程技術分析。中原大學電機工程研究所學位論文。2012。1-189。