透過您的圖書館登入
IP:3.12.71.237
  • 期刊
  • OpenAccess

A Class-based Language Model Approach to Chinese Named Entity Identification

並列摘要


This paper presents a method of Chinese named entity (NE) identification using a class-based language model (LM). Our NE identification concentrates on three types of NEs, namely, personal names (PERs), location names (LOCs) and organization names (ORGs). Each type of NE is defined as a class. Our language model consists of two sub-models: (1) a set of entity models, each of which estimates the generative probability of a Chinese character string given an NE class; and (2) a contextual model, which estimates the generative probability of a class sequence. The class-based LM thus provides a statistical framework for incorporating Chinese word segmentation and NE identification in a unified way. This paper also describes methods for identifying nested NEs and NE abbreviations. Evaluation based on a test data with broad coverage shows that the proposed model achieves the performance of state-of-the-art Chinese NE identification systems.

參考文獻


Richard S., S.(2001).Normalization of non-standard words.Computer Speech and Language.15(3),287-333.
Black, A.,Taylor, P.,Caley, R.(1998).The Festival Speech synthesis system.
Black, W. J. Facile,Rinaldi, F.,Mowatt, D.(1998).Proceedings of 7th Message Understanding Conference.
Black, W. J. Facile,Vaskilakopoulos, A.(2002).The 6th Conference on Natural Language Learning.
Borthwick, A.(1999).A Maximum Entropy Approach to Named Entity Recognition.

延伸閱讀