Chinese Word Segmentation as Character Tagging

In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. A maximum entropy tagger is trained on manually annotated data to automatically assign to Chinese characters, or hanzi, tags that indicate the position of a hanzi within a word. The tagged output is then converted into segmented text for evaluation. Preliminary results show that this approach is competitive against other supervised machine-learning segmenters reported in previous studies, achieving precision and recall rates of 95.01% and 94.94% respectively, trained on a 237K-word training set.

並列關鍵字

Chinese word segmentation ； supervised machine-learning ； maximum entropy ； character tagging

參考文獻

Brill, Eric(1993).A Corpus-Based Approach to Language Learning.

Google Scholar

Xiang-Ling J. X-L., J. X-L.(1992).Chinese Morphology and its Interface with the Syntax.

Google Scholar

Fung, Pascale,Wu, De-Kai(1994).The 4th Conference on Applied Natural Language Processing.

Google Scholar

William, Chilin, Richard, Nancy N., N.(1996).A Stochastic Finite-State Word-Segmentation Algorithm for Chinese.Computational Linguistics.22(3),377-404.

Google Scholar

Ge, Xian-Ping(1999).ACM SIGIR '99 workshop on evaluation of web document retrieval.

Google Scholar

被引用紀錄

Jiang, T. J. (2012). Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context [doctoral dissertation, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2012.00487

潘俊言（2014）。中文文章修辭架構自動分類初步研究〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2014.00324

Huang, H. H. (2014). 中文語篇標記解釋與語篇關係辨識及其在意見極性分析之研究 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2014.00506

陳琦宇（2009）。改良式貝氏分類器在情緒分類之研究〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-2907200920512600

國際替代計量

Chinese Word Segmentation as Character Tagging

全文下載

主題瀏覽