基於組合特徵的漢語名詞詞義消歧

Word sense disambiguation (WSD) plays an important role in many areas of natural language processing, such as machine translation, information retrieval, sentence analysis, and speech recognition. Research on WSD has great theoretical and practical significance. The main purposes of this study were to study the kind of knowledge that is useful for WSD, and to establish a new WSD model based on syntagmatic features, which can be used to disambiguate noun sense in Mandarin Chinese effectively. Close correlation has been found between lexical meaning and its distribution. According to a study in the field of cognitive science [Choueka, 1983], people often disambiguate word sense using only a few other words in a given context (frequently only one additional word). Thus, the relationships between one word and others can be effectively used to resolve ambiguity. Based on a descriptive study of more than 4,000 Chinese noun senses, a multi-level framework of syntagmatic analysis was designed to describe the syntactic and semantic constraints of Chinese nouns. All of these polyseme nouns were surveyed, and it was found that different senses have different and complementary distributions at the syntax and/or collocation levels. This served as a foundation for establishing an WSD model by using grammatical information and a thesaurus provided by linguists. The model uses the Grammatical Knowledge-base of Contemporary Chinese [Yu Shiwen et al. 2002] as one of its main machine-readable dictionaries (MRDs). It can provide rich grammatical information for disambiguation of Chinese lexicons, such as parts-of-speech (POS) and syntax functions. Another resource of the model is the Semantic Dictionary of Contemporary Chinese [Wang Hui et al. 1998], which provides a thesaurus and semantic collocation information of more than 20,00O nouns. They were employed to analyze 635 Chinese polysemous nouns. By making full use of these two MRD resources and a very large POS-tagged corpus of Mandarin Chinese, a multi-level WSD model based on syntagmatic features was developed. The experiment described at the end of the paper verifies that the approach achieves high levels of efficiency and precision.

並列關鍵字

Word Sense Disambiguation ； syntagmatic features ； noun sense ； Chinese Language Information Processing

參考文獻

Choueka, Y.,Lusignan, S.(1983).A Connectionist Scheme for Modeling Word Sense Disambiguation.Cognition and Brain Theory.6(1),89-120.

Google Scholar

Church, Kenneth W.,Gale, William A.,Yarowsky, David(1993).A Method for Disambiguation Word Senses in a Large Corpus.Computer and the Humanities.26,415-439.

Google Scholar

Kenneth, K. K. W., K. W.(1992).Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation.

Google Scholar

Ide, Nancy,Veronis, Jean(1998).Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art.Computational Linguistics.24(1),1-40.

Google Scholar

Lam, Sze-Sing,Wong, Kam-Fai,Lum, Vincent(1997).LSD-C-A. linguistic-based word-sense disambiguation algorithm for Chinese.Computer Processing of Oriental Languages.10(4),409-422.

Google Scholar

被引用紀錄

Shih, M. H. (2011). 基於中文詞網之領域詞義區分試驗 [master's thesis, National Taipei University of Technology]. Airiti Library. https://doi.org/10.6841/NTUT.2011.00319

國際替代計量

基於組合特徵的漢語名詞詞義消歧

全文下載

主題瀏覽