透過您的圖書館登入
IP:3.147.47.59
  • 學位論文

基於搜尋結果之循序性中文分詞

A Search-result-based Sequential Method for Chinese Segmentation

指導教授 : 鄭卜壬

摘要


在中文的自然語言處裡上,中文分詞是非常基本且非常重要的工作。傳統的分詞方式通常傾向為結合字典式分詞與統計式分詞方法,有時還會同時結合其他多項外部資源或技術,比如:未知詞擷取、詞性分析等等,才能達到理想的分詞結果。這些研究的共同特色在於,他們必須要有人力的介入,先收集大量的參考資料,才能開始進行之後的分詞工作。而本研究的重點在於,使用自動化的方式自動搜集資源以進行中文分詞。我們提出一種監督式學習方法,這是一種利用搜尋引擎為輔助的兩階段式中文分詞演算法。此種方法可以克服傳統分詞需要時常性更新、擴充詞典或文獻等參考資料才能維持良好分詞結果的缺陷。在第一階段中我們利用搜尋引擎提供資訊,使我們的模型獲得更多的分詞資訊做學習。第二階段我們則根據模擬人類閱讀習慣的分詞方式設計一循序性分詞演算法。最後於實驗結果發現,我們確實能利用搜尋引擎解決新詞出現的問題,並且達到理想的分詞結果。

並列摘要


In many Chinese text processing tasks, Chinese word segmentation is a vital and required step. There are lots of method have been proposed to address this problem using dictionary-based or statistical-based algorithm in previous study. In order to achieve high performance, some of these studies used external resource or other technology like identifying unknown words, part of speech tagging etc. Some of these combined with various machine learning algorithm to help segmentation. The goal of this paper is to propose a simple and supervise learning method using search engine to help Chinese word segmentation without human intervention. In first stage we use training data to construct a classifier to predict whether the gap between every two Chinese word is a boundary, and in second stage we propose a sequential method to complete Chinese word segmentation. The experiment result shows that our system performs very well and some explanations and analysis also present in this paper.

參考文獻


[1] X. J. Wang, W. Liu, Y. Qin, “A Search-based Chinese Word Segmentation Method”, WWW, 2007.
[3] Fan, C. K. and W. H. Tsai, “Automatic Word Identification in Chinese Sentences by the Relaxation Technique,” Computer Processing of Chinese and Oriental Languages, Vol. 2, No. 4, pp. 33-56, 1988.
[4] C. L. Hsieh, “A Genetic Approach to Chinese Text Segmentation”, 1998.
[8] Joachims, T., “Text Categorization with Support Vector Machines: Learning with Many Relenvant Features”, Proceedings of th Eourpean Conference on Machine Learning, pp. 21-24 (pp. 137-142), 1998.
[10] CC Chang and CJ Lin. “Libsvm: A Library for Support Vector Machines.” ACM Transactions on Intelligant Systems and Technology, 2:27:1-27:27, 2011.

延伸閱讀