透過您的圖書館登入
IP:52.14.121.242
  • 學位論文

中文多義詞標記及其在語言模型的應用

Chinese Multiple Word Sense Labeling and Its Application to Language Modeling

指導教授 : 王逸如

摘要


本論文主要研究分為語言模型的改善和中文詞向量的研究與應用。在語言模型的改善,我們使用加權有限狀態轉換機於語音辨識上,透過事先給定正確的音素序列取代聲學模型,使得辨識結果完全由語言模型決定。我們藉由改善斷詞後處理和發音字典建立不同的語言模型使辨識率提升。 另外一個研究是有關中文詞向量的研究與應用。我們研究一詞多義對中文詞向量的影響,使用非監督式的學習方法利用詞向量標記一詞多義,透過上下文環境和詞性資訊進行詞義標記來解決一詞多義的問題,並將改善後的結果進行多種定性分析,最後將詞義資訊加入於語言模型中,訓練出一個具有詞義資訊的語言模型。

並列摘要


This thesis can be divided into two parts, the improvement of language model and Chinese word embedding and its application. In the improvement of the language model, we use the weighted finite state transducer on speech recognition. We use the correct phoneme sequence to replace the acoustic model, which result the speech recognition only depend on language model. By improving the post-processing of word segmentation and pronunciation dictionary can enhance accuracy of speech recognition. In Chinese word embedding, we study the polysemy effect on Chinese words vectors. To solve the problem of polysemy, we use unsupervised learning to label polysemy by multiple word sense vector which was learning from context and part-of-speech. We propose some qualitative analysis to measure the improvement. Finally, we construct a language model which contain the semantic information by word sense corpus which was labeled polysemy by multiple word sense vector.

參考文獻


[1] Stolcke, Andreas. "SRILM-an extensible language modeling toolkit." Interspeech. Vol. 2002. 2002.
[2] Mohri, Mehryar, Fernando Pereira, and Michael Riley. "Weighted finite-state transducers in speech recognition." Computer Speech & Language 16.1 (2002): 69-88.
[6] Mikolov, Tomas, et al. "Rnnlm-recurrent neural network language modeling toolkit." Proc. of the 2011 ASRU Workshop. 2011.
[8] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
[12] Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global Vectors for Word Representation." EMNLP. Vol. 14. 2014.

延伸閱讀