透過您的圖書館登入
IP:3.14.253.221
  • 期刊
  • OpenAccess

Latent Semantic Language Modeling and Smoothing

並列摘要


Language modeling plays a critical role for automatic speech recognition. Typically, the n-gram language models suffer from the lack of a good representation of historical words and an inability to estimate unseen parameters due to insufficient training data. In this study, we explore the application of latent semantic information (LSI) to language modeling and parameter smoothing. Our approach adopts latent semantic analysis to transform all words and documents into a common semantic space. The word-to-word, word-to-document and document-to-document relations are, accordingly, exploited for language modeling and smoothing. For language modeling, we present a new representation of historical words based on retrieval of the most relevant document. We also develop a novel parameter smoothing method, where the language models of seen and unseen words are estimated by interpolating the κ nearest seen words in the training corpus. The interpolation coefficients are determined according to the closeness of words in the semantic space. As shown by experiments, the proposed modeling and smoothing methods can significantly reduce the perplexity of language models with moderate computational cost.

參考文獻


Bell, T. C.,Written, I. H.(1991).The zero-frequency problem: Estimating the Probabilities of novel events in adaptive text compression.IEEE Transaction on Information Theory.37(4),1085-1094.
Bellegarda, J. R.(1998).A Multi-span Language Modeling Framework for Large Vocabulary Speech Recognition.IEEE Transactions on Speech and Audio Processing.6(5),456-467.
Bellegarda, J. R.(1997).Proc. IEEE Workshop on Automatic Speech Recognition and Understanding.
Bellegarda, J. R.(2000).Exploiting latent semantic information in statistical language modeling.Proceeding of IEEE.88(8),1279-1296.
Bellegarda, J. R.(2000).Large vocabulary Speech recognition with multi-span statistical language models.IEEE Transactions on Speech and Audio Processing.8(1),76-84.

被引用紀錄


邱炫盛(2006)。利用主題與位置相關語言模型於中文連續語音辨識〔碩士論文,國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-0712200716132659

延伸閱讀