透過您的圖書館登入
IP:3.142.43.53
  • 期刊
  • OpenAccess

Strategies of Processing Japanese Names and Character Variants in Traditional Chinese Text

並列摘要


This paper proposes an approach to identify word candidates that are not Traditional Chinese, including Japanese names (written in Japanese Kanji or Traditional Chinese characters) and word variants, when doing word segmentation on Traditional Chinese text. When handling personal names, a probability model concerning formats of names is introduced. We also propose a method to map Japanese Kanji into the corresponding Traditional Chinese characters. The same method can also be used to detect words written in character variants. After integrating generation rules for various types of special words, as well as their probability models, the F-measure of our word segmentation system rises from 94.16% to 96.06%. Another experiment shows that 83.18% of the 862 Japanese names in a set of 109 human-annotated documents can be successfully detected.

參考文獻


Chen, H.H.,Ding, Y.W.,Tsai S.C.,Bian, G.W.(1998).Description of the NTU System Used for MET2.Proceedings of 7th Message Understanding Conference (MUC-7).(Proceedings of 7th Message Understanding Conference (MUC-7)).
Chien, L.F.(1997).PAT-tree-based keyword extraction for Chinese information retrieval.Proceedings of SIGIR97.(Proceedings of SIGIR97).
Gao, J.,Li, M.,Huang, C.N.(2003).Improved Source-Channel Models for Chinese Word Segmentation.Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL 2003).(Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL 2003)).
羅永聖(2008)。結合多類型字典與條件隨機域之中文斷詞與詞性標記系統研究。National Taiwan University。
Lu, X.(2007).Combining machine learning with linguistic heuristics for Chinese word segmentation.Proceedings of the FLAIRS Conference.(Proceedings of the FLAIRS Conference).

被引用紀錄


Chiang, A. Y. (2016). 戀愛言談:告白中的說服、拒絕、與接受之策略研究 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU201602617

延伸閱讀