透過您的圖書館登入
IP:3.142.173.227
  • 學位論文

查詢詞翻譯對跨語檢索之影響

The Impact of Query Term Translation on Cross-language Information Retrieval

指導教授 : 鄭卜壬

摘要


查詢詞翻譯在跨語檢索的範疇中扮演重要的角色。跨語檢索乃是由一種語言的查詢,找尋以另外一種語言撰寫的相關文件。本篇論文主要目的在於探討不同的查詢詞之間,對於跨語檢索該翻譯或是不該翻譯之間的不同。有些未翻譯的查詢詞會造成無法補救的搜尋效能的減損,然而有些查詢詞在未翻譯的情況下反而能夠讓查詢效能更佳。在這樣的觀察之下,本篇論文的重點在於我們是否能夠預測一個查詢詞該不該翻譯的機率,並且希望了解這樣的機率是否有助於提升跨語檢索之效能。我們的方法將會採用分類以及回歸的方式來預測一個字該不該翻譯。同時我們將抽取一群有效的特徵來幫我們做預測,包括語言,統計,以及跨語方面的特徵值。實驗環境是在NTCIR-4和NTCIR-5兩個標準測試集之下,我們發現所提之方法能有顯著的提升跨語檢索之效能。文章同時也將提供有關於OOV查詢字以及一般查詢字的深入探討。我們也將審視查詢詞翻譯的正確或是錯誤與查詢詞翻譯的品質之間的關連性。

並列摘要


Query translation is an important task in cross-language information retrieval (CLIR) aiming to translate queries into languages used in documents. The purpose of this paper is to investigate the necessity of translating query terms, which might differ from one term to another. Some untranslated terms cause irreparable performance drop while others do not. We propose an approach to estimate the translation probability of a query term, which helps decide if it should be translated or not. The approach learns regression and classification models based on a rich set of linguistic and statistical properties of the term. Experiments on NTCIR-4 and NTCIR-5 English-Chinese CLIR tasks demonstrate that the proposed approach can significantly improve CLIR performance. An in-depth analysis is provided for discussing the impact of untranslated out-of-vocabulary (OOV) query terms and translation quality of non-OOV query terms on CLIR performance. We also scrutinize how translation accuracy is related to translation quality, which eventually influences the translation necessity.

參考文獻


[2] L. Ballesteros and W. B. Croft. Dictionary methods for cross-lingual information
retrieval. In Database and Expert Systems Applications, pages 791–801, 1996.
[4] L. Ballesteros andW. B. Croft. Resolving ambiguity for cross-language retrieval. In
[5] M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In
clir: Aligned corpus and bi-directional translation-based strategies. In CLEF ’01:

延伸閱讀