We introduce a method for learning to find domain-specific translations for a given term on the Web. In our approach, the source term is transformed into an expanded query aimed at maximizing the probability of retrieving translations from a very large collection of mixed-code documents. The method involves automatically generating sets of target-language words from training data in specific domains, automatically evaluating target words for effectiveness in retrieving documents containing the sought-after translations. At run time, the given term is transformed into an expanded query and submitted to a search engine, and ranked translations are extracted from the document snippets returned by the search engine. We present a prototype search engine, TermMine, which applies the method to Web search engines. Evaluations on a set of terms show that TermMine outperforms state-of-the-art machine translation systems.
在本論文中,我們提出一個新方法,以擷取網路上特定領域名詞的翻譯。我們的方法首先將一個原始語言的專有名詞轉換成擴充查詢式,以期增加搜尋引擎回傳含有翻譯的文件之機會,以便精確地抽取出文件摘要內的相關翻譯。我們會預先針對每一個不同的知識領域,訓練出所屬的目標語言關鍵詞。這些領域關鍵詞可以幫助我們有效的從網路上收集包含領域相關翻譯的文件資料;到了執行階段,我們便將欲翻譯的專有名詞,以領域相關關鍵字擴充成有效的查詢式,送交搜尋引擎處理,並且從查詢的結果中擷取出對應翻譯。我們將我們所提出的方法實作成了一個名為 TermMine 的翻譯系統,實驗和評估的結果顯示,所提出的方法的確可以有效地,改善特定領域名詞翻譯的效果。