透過您的圖書館登入
IP:18.227.114.125
  • 期刊
  • OpenAccess

Unknown Word Detection for Chinese by a Corpus-based Learning Method

並列摘要


One of the most prominent problems in computer processing of the Chinese language is identification of the words in a sentence. Since there are no blanks to mark word boundaries, identifying words is difficult because of segmentation ambiguities and occurrences of out-of-vocabulary words (i.e., unknown words). In this paper, a corpus-based learning method is proposed which derives sets of syntactic rules that are applied to distinguish monosyllabic words from monosyllabic morphemes which may be parts of unknown words or typographical errors. The corpus-based learning approach has the advantages of: 1. automatic rule learning, 2. automatic evaluation of the performance of each rule, and 3. balancing of recall and precision rates through dynamic rule set selection. The experimental results show that the rule set derived using the proposed method outperformed hand-crafted rules produced by human experts in detecting unknown words.

並列關鍵字

無資料

參考文獻


陳克健 Keh-Jiann, Keh-Jiann(1995).Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging.Computational Linguistics.21(4),543-566.
Chang, C. H.(1994).A Pilot Study on Automatic Chinese Spelling Error Correction.Communications of COLIPS: an International Journal of the Chinese and Oriental Language Information Processing Society.4(2),143-149.
Jyun-Sheng J. S., J. S.(1991).Proceedings of ROCLING IV.
Chen, K. J.,Chang, Li-Li,Huang, C. R.(1997).Segmentation Standard for Chinese Natural Language Processing.International Journal of Computational Linguistics and Chinese Lnaguage Processing.2(2),74-62.
Chen, K. J.,Hsu, H. L.,Huang, C. R.,Chang, L. P.(1996).Proceedings of PACLIC 11th Conference.

被引用紀錄


葉寶純(2017)。中文BR斷詞器之建置與應用—以批踢踢實業坊股票版為例〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2017.01058
Bai, M. H. (2013). Extraction of Bilingual Multiword Expressions with Application to Bilingual Concordancer [doctoral dissertation, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2013.00703
Huang, C. C. (2012). 互動式電腦輔助翻譯與寫作助手 [doctoral dissertation, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2012.00544
黃建智(2009)。以本體技術為基礎分析網路新聞之競爭智慧〔碩士論文,中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu200901204
李佳曄(2008)。以網路口碑之語意萃取為基礎的推薦系統〔碩士論文,中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu200900486

延伸閱讀