使用中文語系的地方如台灣、香港與中國,並沒有統一的翻譯標準,以致於同一個外來詞通常被翻譯成數個不同的中文詞。例如,澳洲首都Sydney依其發音被翻譯成「雪梨」、「雪黎」或「悉尼」等不同的中文音譯詞。如此的翻譯結果,會導致搜尋引擎檢索資料不完整。例如,使用「雪梨」檢索,無法得到使用「雪黎」與「悉尼」翻譯詞的網頁資料。本研究我們提出一套探勘架構:給予一個中文音譯詞,透過搜尋全球資訊網網頁,盡可能找出其所有的中文同義音譯詞。本研究成果可應用於改善搜索引擎跨語系資料檢索不齊全之問題。研究架構包括兩個階段,首先,我們提出一個有效率的方法蒐集有可能包含同義音譯詞的相關網頁摘要短文。其次是從蒐集的網頁摘要短文中萃取同義的音譯詞。實驗結果證明我們所提方法的可行性,顯示可以有效地找到許多同義音譯詞。再者,找到的同義音譯詞和其他雜訊相比,大部分對輸入音譯詞都有比較高的相似度排名。
There is no translation standard across the regions such as Taiwan, Hong Kong and China where Chinese language is used. As a result, a foreign proper noun is often translated to different Chinese words which lead to the incomplete search problem if only one of the words is used as the query keyword to a search engine. In this paper, we present a framework to retrieve synonymous transliterations as many as possible from the Web for an input Chinese transliteration. The research results could be applied to query expansion so as to alleviate the incomplete search problem. There are two major phases in the framework. The first is to develop an effective method to collect relevant Web snippets which may contain synonymous transliterations. The second is to extract synonymous transliterations from the set of relevant Web snippets. Experimental results show that the proposed framework is feasible and effective. Moreover, most of extracted synonymous transliterations, compared with other noise terms, have a higher rank of similarity to the input transliteration.
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。