透過您的圖書館登入
IP:18.224.246.203
  • 學位論文

學習機器翻譯中的雙語重排序模型

Learning Bilingual Linguistic Reordering Model for Statistical Machine Translation

指導教授 : 張俊盛

摘要


在本論文中,我們提出一個改良的機器翻譯重排序模型,可用於以BTG為基礎的統計式機器翻譯。改良的方法,主要是利用雙語的資訊,以取出最大熵值法(Maximum Entropy)模型的特徵值。方法的主要步驟,是從對應完成的大量平行語料中,取出重排序例子,進而從中提取最大熵值法特徵值。我們利用取出的特徵值來訓練出最大熵值法重排序模型。本研究的測試方式,是利用美國NIST,在2006年及2008年所提供的測試資料,進行中文翻譯成英文的實驗。在評估方面,我們使用BLEU準則來進行評分。實驗結果顯示,我們提出的雙語資訊重排序模型,在以BLEU分數為評估標準的測試中,以顯著的差距,超越以片語為基礎的機器翻譯系統,以及用雙語單字為特徵值的BTG翻譯系統。本論文的主要貢獻在於,我們的方法使用極少量的特徵值數目,卻能獲得更高的機器翻譯品質。

並列摘要


In this thesis, we propose a method for learning a reordering model for BTG-based statistical machine translation (SMT). The model focuses on linguistic features extracted from bilingual phrases. Our method involves extracting reordering examples as well as features such as part-of-speech and word class from aligned parallel sentences. The features are classified with special considerations of phrase lengths. We then use these features to train the Maximum Entropy (ME) reordering model. With the model, we performed Chinese-to-English translation tasks. Experimental results show that our bilingual linguistic model significantly outperforms the state-of-the-art phrase-based and BTG-based SMT systems, measured with BLEU scores. Our methodology not only reduce the feature size by a large margin, compared to previously proposed lexicalized reordering models, but also improves the translation quality.

參考文獻


David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL 2005, pp. 263-270.
Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Massachusetts.
Philipp Koehn. 2004. Pharaoh: a Beam Search Decoder for Phrased-Based Statistical Machine Translation Models. In Proceedings of AMTA 2004.
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan,Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constrantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL 2007, Demonstration Session.
Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. In Proceedings of ACL 2003.

延伸閱讀