透過您的圖書館登入
IP:3.133.109.30
  • 學位論文

用統計方法自動化擷取未知詞之研究

A Study on Automatic Unknown Word Extraction Based on Statistical Approach

指導教授 : 黃光璿

摘要


隨著網際網路快速的發展,網路新聞已經是現代人獲取資訊的重要管道之一。每日新的事物、新的觀念、新的焦點等等的產生,在這資訊爆炸的時代,新的詞彙無論何時何地都在快速的增加,而且詞彙組成的結構也不盡相同,無法以單一的規則去搜尋新的未知詞。中文文字的呈現方式與西方文字有很大的差異,對於中文文字而言,沒有空白來區分每一個詞的界線,所以在處理中文資料的過程中,中文斷詞是一個極為重要的議題,而未知詞的產生將深深地影響中文斷詞結果的好壞。因此,擷取未知詞對於中文自然語言處理(Natural Languages Process)是非常重要的課題。

並列摘要


With the rapid development of the Internet, online news has become one of the most important channels for modern people to obtain information. In the era of information explosion, new daily things, new ideas, new focuses and new terms are increasing rapidly whenever and wherever. Moreover, the structure of vocabulary is not the same, and it is impossible to identify new unknown words by only a single rule. The features of Chinese writing are quite different from Western languages. For Chinese a clause, there are no blanks to delimit the boundaries of words. In the processing of Chinese sentences, Chinese word segmentation is an extremely important topic. The collection of unknown words will significantly affect the quality of Chinese word segmentation results. Therefore, extracting unknown words is a very important topic for Chinese natural languages processing.

參考文獻


[1] K.-J. Chen and S.-H. Liu, "Word identification for Mandarin Chinese sentences," in Proceedings of the 14th conference on Computational linguistics-Volume 1, 1992, pp. 101-107: Association for Computational Linguistics.
[2] S. Foo and H. Li, "Chinese word segmentation and its effect on information retrieval," Information processing & management, vol. 40, no. 1, pp. 161-190, 2004.
[3] 中央研究院資訊科學所詞庫小組. 中文斷詞線上服務. Available: http://ckipsvr.iis.sinica.edu.tw/
[4] R. Sproat, W. Gale, C. Shih, and N. Chang, "A stochastic finite-state word-segmentation algorithm for Chinese," Computational linguistics, vol. 22, no. 3, pp. 377-404, 1996.
[5] K.-J. Chen and M.-H. Bai, "Unknown word detection for Chinese by a corpus-based learning method," in International Journal of Computational Linguistics & Chinese Language Processing, Volume 3, Number 1, February 1998: Special Issue on the 10th Research on Computational Linguistics International Conference, 1998, vol. 3, no. 1, pp. 27-44.

延伸閱讀