透過您的圖書館登入
IP:18.118.126.241
  • 期刊
  • OpenAccess

Chinese Word Segmentation by Classification of Characters

並列摘要


During the process of Chinese word segmentation, two main problems occur: segmentation ambiguities and unknown word occurrences. This paper describes a method to solve the segmentation problem. First, we use a dictionary-based approach to segment the text. We apply the Maximum Matching algorithm to segment the text forwards (FMM) and backwards (BMM). Based on the difference between FMM and BMM, and the context, we apply a classification method based on Support Vector Machines to re-assign the word boundaries. In so doing, we use the output of a dictionary-based approach, and then apply a machine-learning-based approach to solve the segmentation problem. Experimental results show that our model can achieve an F-measure of 99.0 for overall segmentation, given the condition that there are no unknown words in the text, and an F-measure of 95.1 if unknown words exist.

參考文獻


Asahara, M.,C.L. Goh,X.J. Wang,Y. Matsumoto(2003).Combining Segmenter and Chunker for Chinese Word Segmentation.(Proceedings of Second SIGHAN Workshop on Chinese Language Processing).
Chen, K.J.,M.H. Bai(1997).Unknown Word Detection for Chinese By a Corpus-based Learning Method.(Proceedings of ROCLING X).
Chen, K.J.,W.Y. Ma(2002).Unknown Word Extraction for Chinese Documents.(Proceedings of COLING 2002).
Fu, G.H.,K.K. Luke(2003).An Integrated Approach for Chinese Word Segmentation.(In Proceedings of PACLIC 17).
Fu, G.H.,X.L. Wang(1999).Unsupervised Chinese Word Segmentation and Unknown Word Identification.(Proceedings of NLPRS).

被引用紀錄


鄭武奇(2013)。知識分享社群中最佳解答之預測〔碩士論文,元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2013.00189
卓育生(2009)。以電腦模擬來探討瘧疾的BP2 與VP2半胱胺酸蛋白酶的抑制劑研究〔碩士論文,亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215455520
劉冠隴(2009)。以電腦模擬探討抑制劑在半胱氨酸蛋白酶的結合位置〔碩士論文,亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916272047
林聖訓(2013)。運用詞頻分析技術於XBRL財報與附註關聯之研究-以投資性不動產為例〔碩士論文,國立中正大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201613563589

延伸閱讀