透過您的圖書館登入
IP:3.23.101.60
  • 期刊
  • OpenAccess

Chinese Word Segmentation as Character Tagging

並列摘要


In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. A maximum entropy tagger is trained on manually annotated data to automatically assign to Chinese characters, or hanzi, tags that indicate the position of a hanzi within a word. The tagged output is then converted into segmented text for evaluation. Preliminary results show that this approach is competitive against other supervised machine-learning segmenters reported in previous studies, achieving precision and recall rates of 95.01% and 94.94% respectively, trained on a 237K-word training set.

參考文獻


Brill, Eric(1993).A Corpus-Based Approach to Language Learning.
Xiang-Ling J. X-L., J. X-L.(1992).Chinese Morphology and its Interface with the Syntax.
Fung, Pascale,Wu, De-Kai(1994).The 4th Conference on Applied Natural Language Processing.
William, Chilin, Richard, Nancy N., N.(1996).A Stochastic Finite-State Word-Segmentation Algorithm for Chinese.Computational Linguistics.22(3),377-404.
Ge, Xian-Ping(1999).ACM SIGIR '99 workshop on evaluation of web document retrieval.

被引用紀錄


Jiang, T. J. (2012). Syllable Word Segmentation for Mandarin Chinese via Double Ranking of the Left and Right Context [doctoral dissertation, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2012.00487
潘俊言(2014)。中文文章修辭架構自動分類初步研究〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2014.00324
Huang, H. H. (2014). 中文語篇標記解釋與語篇關係辨識及其在意見極性分析之研究 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2014.00506
陳琦宇(2009)。改良式貝氏分類器在情緒分類之研究〔碩士論文,元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-2907200920512600

延伸閱讀