透過您的圖書館登入
IP:18.189.170.17
  • 期刊

Exploiting Association Words to Retrieve Synonymous Transliterations from Web Snippets

利用關聯詞從全球資訊網中探勘同義音譯詞

摘要


使用中文語系的地方如台灣、香港與中國,並沒有統一的翻譯標準,以致於同一個外來詞通常被翻譯成數個不同的中文詞。例如,澳洲首都Sydney依其發音被翻譯成「雪梨」、「雪黎」或「悉尼」等不同的中文音譯詞。如此的翻譯結果,會導致搜尋引擎檢索資料不完整。例如,使用「雪梨」檢索,無法得到使用「雪黎」與「悉尼」翻譯詞的網頁資料。本研究我們提出一套探勘架構:給予一個中文音譯詞,透過搜尋全球資訊網網頁,盡可能找出其所有的中文同義音譯詞。本研究成果可應用於改善搜索引擎跨語系資料檢索不齊全之問題。研究架構包括兩個階段,首先,我們提出一個有效率的方法蒐集有可能包含同義音譯詞的相關網頁摘要短文。其次是從蒐集的網頁摘要短文中萃取同義的音譯詞。實驗結果證明我們所提方法的可行性,顯示可以有效地找到許多同義音譯詞。再者,找到的同義音譯詞和其他雜訊相比,大部分對輸入音譯詞都有比較高的相似度排名。

並列摘要


There is no translation standard across the regions such as Taiwan, Hong Kong and China where Chinese language is used. As a result, a foreign proper noun is often translated to different Chinese words which lead to the incomplete search problem if only one of the words is used as the query keyword to a search engine. In this paper, we present a framework to retrieve synonymous transliterations as many as possible from the Web for an input Chinese transliteration. The research results could be applied to query expansion so as to alleviate the incomplete search problem. There are two major phases in the framework. The first is to develop an effective method to collect relevant Web snippets which may contain synonymous transliterations. The second is to extract synonymous transliterations from the set of relevant Web snippets. Experimental results show that the proposed framework is feasible and effective. Moreover, most of extracted synonymous transliterations, compared with other noise terms, have a higher rank of similarity to the input transliteration.

參考文獻


The 2008 Time 100: The World's most influential people
AbdulJaleel, N.,Larkey, L. S.(2003).Statistical transliteration for English-Arabic cross language Information retrieval.Proceedings of the 12th international conference on information and knowledge management.(Proceedings of the 12th international conference on information and knowledge management).:
Aggarwal, C. C.,Al-Garawi, F.,Yu, P. S.(2001).Intelligent crawling on the World Wide Web with arbitrary predicates.Proceedings of the 10th International Conference on World Wide Web.(Proceedings of the 10th International Conference on World Wide Web).:
Al-Onaizan, Y.,Knight, K.(2002).Proceedings of ACL-02 Workshop on Computational Approaches to Semitic Languages.Philadelphia, Pennsylvania:
Babaria, R.,Nath, J. S.,S, K.,R, S. K.,Bhattacharyya, C.,Murty, M. N.(2007).Focused crawling with scalable ordinal regression solvers.Proceedings of the 24th International Conference on Machine Learning.(Proceedings of the 24th International Conference on Machine Learning).:

延伸閱讀