透過您的圖書館登入
IP:18.222.125.171
  • 學位論文

統計式機器翻譯之二維雙語詞組分斷及對應模式

Two Dimensional Bilingual Phrase Segmentation and Alignment Models for SMT

指導教授 : 張景新
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在以詞組為本的統計式翻譯系統 (Phrase-Based SMT) 架構中,產生詞組對應 (phrase alignment) 的作法是以詞彙對應 (word alignment) 為基礎,運用啟發式 (heuristic) 的方法來找出可能的詞組對應。雖然詞彙對應的最佳化準則 (optimization criteria) 有強烈的理論支持 [Brown et al., 1990, 1993],但在產生詞組對應的部分並無理論基礎來說明它符合什麼樣的最佳化準則,因此,整體的翻譯模型 (Translation Model) 是否為最佳,仍是未知。 本研究提出利用一個整合式的二維雙語詞組分斷及對應模式 (Two Dimensional Bilingual Phrase Segmentation and Alignment Models, 2D BPSAM),直接由平行語料 (Parallel Corpus) 擷取出同時符合來源語及目標語規則的詞組對應,而非以詞彙對應為基礎,再經由啟發式的方法產生不確定是否為最佳的詞組對應。最後應用於音譯作業,搭配簡易的語言模型 (Language Model, LM) 及解碼器 (Decoder) 進行翻譯,並與使用 Moses 所建構的翻譯系統進行效能評估比較。 實驗結果顯示,在 TM-prune 翻譯模型與 Integrated 解碼模型所建構的 2D BPSAM 翻譯系統在 Word Accuracy 與 Mean F-score 以及解碼速度的部分與 Moses 的表現相當,而詞組翻譯表的大小約比 Moses 下降 77%。

並列摘要


In the phrase-based statistical translation system architecture, generate phrase alignment is using heuristic method to identify possible phrases alignment based on word alignment. Although the word alignment is supported by strong theoretical criteria, but it is not have theorem to prove the produce phrase alignment method is optimize. This research proposed an integrated Two Dimensional Bilingual Phrase Segmentation and Alignment Models find the best phrase alignment from parallel corpus directly. It is different from using heuristic method based on word alignment to extract the phrase alignment which can not sure is optimized or not. Finally, we apply this model to transliteration task then evaluation system preference between Moses and proposed model. Experimental results show that TM-prune translation model combine with Integrated decode model to constructed the 2D BPSAM translation system that in Word Accuracy, Mean F-score and decoding speed is close to Moses, and the phrase translation table size down than about Moses 77%.

參考文獻


[Brown et al., 1990] Brown, Peter F., J. Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. “A statistical approach to machine translation.” Computational Linguistics, 16(2):79–85.
[Brown et al., 1993] Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and R. L. Mercer. 1993. “The mathematics of statistical machine translation: Parameter estimation.” Computational Linguistics, 19(2):263–311.
[Koehn et al., 2003] Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Canada.
[Koehn et al., 2006] Koehn, P., Federico, M., Shen, W., Bertoldi, N., Bojar, O., Callison-Burch, C., Cowan, B., Dyer, C., H. Hoang, Zens, R., Constantin, A., Moran, C. C., & Herbst, E. (2006). Open source toolkit for statistical machine translation: Factored translation models and confusion network decoding. Final Report of the 2006 JHU Summer Workshop.
[Och et al., 1999] Och, F. J., Tillmann, C., & Ney, H. [1999]. Improved alignment models for statistical machine translation. In Proc. of the Joint Conf. of Empirical Methods in Natural Language Processing and Very Large Corpora (pp. 20–28).

延伸閱讀