統計式語言模型 – 語音文件標記、檢索以及摘要

由於越來越多唾手可得的多媒體文件，促成了語音文件理解(Understanding)與組織(Organization)在過去二十幾年來成為重要的研究議題。在各式各樣的相關研究中，語音文件標記(Indexing)、檢索(Retrieval)以及語音摘要(Summarization)被視為是這個領域中重要且基礎的研究題目。統計式語言模型(Statistical Language Modeling)一直是一個有趣且極富挑戰的研究領域，其主要被用於量化一段文字在自然語言中存在的可能性。過去許多研究致力於將語言模型運用於語音文件處理的任務之中，多數的研究呈現了豐富且卓越的實驗成果。有鑑於語言模型對於語音文件處理的重要性，本論文將以語言模型為主軸，繼續深究語音文件標記、檢索與摘要等問題。由於使用者所給定的查詢通常非常簡短，這是資訊檢索系統面臨的一項重要考驗，本論文從此問題出發，除了廣泛地研究前人所提出的方法外，並針對傳統的方法提出了一套統一化的見解，更將這項技術應用於語音文件摘要的問題之中；接著，受到I-vector技術的啟發，本論文提出一個新穎的語言模型方法，並進一步的與虛擬關聯回饋技術相結合，提升語音文件檢索的效能；我們也觀察到，雖然語言模型已被使用於語音文件摘要任務之中，但過去所用的技術皆是以單連語模型為主，無法考慮長距離的語意資訊，有鑑於此，本論文提出以遞迴式神經網路語言模型搭配課程學習法的訓練方式，成功地提升了語音文件摘要的成效；最後，語言模型的發展漸漸地由模型化轉變到向量化，本論文提出新穎的相似度評估方式，成功地與近年來陸續提出的各式詞向量表示法相匹配，運用於語音文件摘要的問題上，除此之外，本論文亦提出了機率式詞向量表示法，不僅繼承了傳統表示法的優點，更可以有效地彌補現今詞向量表示法詮釋性的不足。

關鍵字

語言模型；語音文件；標記；檢索；摘要

並列摘要

The inestimable volumes of multimedia associated with spoken documents that been made available to the public in the past two decades have brought spoken document understanding and organization to the forefront as subjects of research. Among all the related subtasks, spoken document indexing, retrieval and summarization can be thought of as the cornerstones of this research area. Statistical language modeling (LM), which purports to quantify the acceptability of a given piece of text, has long been an interesting yet challenging research area. Much research shows that language modeling for spoken document processing has enjoyed remarkable empirical success. Motivated by the great importance of and interest in language modeling for various spoken document processing tasks (i.e., indexing, retrieval and summarization), language modeling is the backbone of this thesis. In real-world applications, a serious challenge faced by the search engine is that queries usually consist of only a few words to address users’ information needs. This thesis starts with a general survey of the practical challenge, and then not only proposes a principled framework which can unify the relationships among several widely-used approaches but also extends this school of techniques to spoken document summarization tasks. Next, inspired by the concept of the i-vector technique, an i-vector based language modeling framework is proposed for spoken document retrieval and reformulated to accurately represent users’ information needs. Following, we are aware that language models have shown preliminary success in extractive speech summarization, but a central challenge facing the LM approach is how to formulate sentence models and accurately estimate their parameters for each sentence in the spoken document to be summarized. Thus, in this thesis we propose a framework which builds on the notion of recurrent neural network language models and a curriculum learning strategy, which shows promise in capturing not only word usage cues but also long-span structural information about word co-occurrence relationships within spoken documents, thus eliminating the need for the strict bag-of-words assumption made by most existing LM-based methods. Lastly, word embedding has been a recent popular research area due to its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies that investigate its use in extractive text or speech summarization. First of all, this thesis focuses on building novel and efficient ranking models based on general word embedding methods for extractive speech summarization. Next, the thesis proposes a novel probabilistic modeling framework for learning word and sentence representations, which not only inherits the advantages of the original word embedding methods but also boasts a clear and rigorous probabilistic foundation.

並列關鍵字

language model ； spoken document ； indexing ； retrieval ； summarization

參考文獻

[7] Y. Bengio, R. Ducharme, and P. Vincent, “A neural probabilistic language model,” in Proc. of NIPS, pp. 932–938, 2000.

[8] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of Machine Learning Research (3), pp. 1137–1155, 2003.

[193] H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng, “MATBN: A Mandarin Chinese broadcast news corpus,” International Journal of Computational Linguistics & Chinese Language Processing, vol. 10, no. 2, pp. 219–236, 2005.

[77] C. L. Huang, B. Ma, H. Li, and C. H. Wu, “Speech indexing using semantic context inference,” in Proc. of Interspeech, pp. 717–720, 2011.

[72] V. Hautamaki, Y. C. Cheng, P. Rajan, and C. H. Lee, “Minimax i-vector extractor for short duration speaker verification,” in Proc. of Interspeech, pp. 3708–3712, 2013.

國際替代計量

統計式語言模型 – 語音文件標記、檢索以及摘要

主題瀏覽