透過您的圖書館登入
IP:18.226.166.214
  • 期刊
  • OpenAccess

Japanese-Chinese Cross-Language Information Retrieval: An Interlingua Approach

並列摘要


Electronically available multilingual information can be divided into two major categories: (1) alphabetic language information (English-like alphabetic languages) and (2) ideographic language information (Chinese-like ideographic languages). The information available in non-English alphabetic languages as well as in ideographic languages (especially, in Japanese and Chinese) is growing at an incredibly high rate in recent years. Due to the ideographic nature of Japanese and Chinese, complicated with the existence of several encoding standards in use, efficient processing (representation, indexing, retrieval, etc.) of such information became a tedious task. In this paper, we propose a Han Character (Kanji) oriented Interlingua model of indexing and retrieving Japanese and Chinese information. We report the results of mono- and cross- language information retrieval on a Kanji space where documents and queries are represented in terms of Kanji oriented vectors. We also employ a dimensionality reduction technique to compute a Kanji Conceptual Space (KCS) from the initial Kanji space, which can facilitate conceptual retrieval of both mono- and cross- language information for these languages. Similar indexing approaches for multiple European languages through term association (e.g., latent semantic indexing) or through conceptual mapping (using lexical ontology such as, WordNet) are being intensively explored. The Interlingua approach investigated here with Japanese and Chinese languages, and the term (or concept) association model investigated with the European languages are similar; and these approaches can be easily integrated. Therefore, the proposed Interlingua model can pave the way for handling multilingual information access and retrieval efficiently and uniformly.

參考文獻


ALTAVISTA
Asian Multimedia Forum=AMF.Cross-Language Information Retrieval at AMF-For Overcoming the Language Barrier in the Use of Internet.
Bell, T. C.(1999).Managing Gigabytes: Compressing and Indexing Documents and Images.
Berry, M.,Young, P.(1995).Using Latent Semantic Indexing for Multi-Language Information Retrieval.Computers and the Humanities.29(6),413-429.
Chen, A.,He, J.,Xu, L.,Gey, F. C.,Meggs, J.(1997).Proceedings of the Conference on Research and Development in Information Retrieval, ACM SIGIR-97.

被引用紀錄


杜宗憲(2009)。雜訊刪減與有聲語音訊號重建之研究〔碩士論文,國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2009.00809
Wang, Y. C. (2015). 跨語言線上百科連結 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2015.00940

延伸閱讀