Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model

This paper presents a Chinese named entity recognizer (NER): Mencius. It aims to address Chinese NER problems by combining the advantages of rule-based and machine learning (ML) based NER systems. Rule-based NER systems can explicitly encode human comprehension and can be tuned conveniently, while ML-based systems are robust, portable and inexpensive to develop. Our hybrid system incorporates a rule-based knowledge representation and template-matching tool, called InfoMap [Wu et al. 2002], into a maximum entropy (ME) framework. Named entities are represented in InfoMap as templates, which serve as ME features in Mencius. These features are edited manually, and their weights are estimated by the ME framework according to the training data. To understand how word segmentation might influence Chinese NER and the differences between a pure template-based method and our hybrid method, we configure Mencius using four distinct settings. The F-Measures of person names (PER), location names (LOC) and organization names (ORO) of the best configuration in our experiment were respectively 94.3%, 77.8% and 75.3%. From comparing the experiment results obtained using these configurations reveals that hybrid NER Systems always perform better performance in identifying person names. On the other hand, they have a little difficulty identifying location and organization names. Furthermore, using a word segmentation module improves the performance of pure Template-based NER Systems, but, it has little effect on hybrid NER systems.

並列關鍵字

無資料

參考文獻

Borthwick, A.(1999).A Maximum Entropy Approach to Named Entity Recognition.

Google Scholar

Chinchor, N. A.(1995).Proceedings of the Sixth Message Understanding Conference(MUC-6).

Google Scholar

Chinchor, N. A.(1998).Proceedings of 7th Message Understanding Conference.

Google Scholar

Chinchor, N. A.(1998).Proceedings of 7th Message Understanding Conference.

Google Scholar

Chinchor, N. A.(1995).Proceedings of the Sixth Message Understanding Conference(MUC-6).

Google Scholar

被引用紀錄

陳凱勛（2013）。自動化擷取地理資訊以結合電子文件與WebGIS-以現代旅遊遊記為例〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2013.02635

Lee, C. W. (2009). Boosting the Accuracy of a Chinese Factoid Question Answering System with Hybrid Modules and Lightweight Methods [doctoral dissertation, National Tsing Hua University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0016-1111200916012635

楊善順（2014）。蘊涵分析於改進中文文字蘊涵識別系統〔碩士論文，朝陽科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0078-0905201416542675

國際替代計量

Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model

全文下載

主題瀏覽