A Class-based Language Model Approach to Chinese Named Entity Identification

This paper presents a method of Chinese named entity (NE) identification using a class-based language model (LM). Our NE identification concentrates on three types of NEs, namely, personal names (PERs), location names (LOCs) and organization names (ORGs). Each type of NE is defined as a class. Our language model consists of two sub-models: (1) a set of entity models, each of which estimates the generative probability of a Chinese character string given an NE class; and (2) a contextual model, which estimates the generative probability of a class sequence. The class-based LM thus provides a statistical framework for incorporating Chinese word segmentation and NE identification in a unified way. This paper also describes methods for identifying nested NEs and NE abbreviations. Evaluation based on a test data with broad coverage shows that the proposed model achieves the performance of state-of-the-art Chinese NE identification systems.

並列關鍵字

Named entity identification ； class-based language model ； contextual model ； entity model

參考文獻

Richard S., S.(2001).Normalization of non-standard words.Computer Speech and Language.15(3),287-333.

Google Scholar

Black, A.,Taylor, P.,Caley, R.(1998).The Festival Speech synthesis system.

Google Scholar

Black, W. J. Facile,Rinaldi, F.,Mowatt, D.(1998).Proceedings of 7th Message Understanding Conference.

Google Scholar

Black, W. J. Facile,Vaskilakopoulos, A.(2002).The 6th Conference on Natural Language Learning.

Google Scholar

Borthwick, A.(1999).A Maximum Entropy Approach to Named Entity Recognition.

Google Scholar

國際替代計量

A Class-based Language Model Approach to Chinese Named Entity Identification

全文下載

主題瀏覽