片語翻譯模型為本之雙語名詞片語擷取__國立清華大學博碩士論文全文影像系統

帳號：guest(3.133.12.92) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	張嘉銘
作者(外文):	Jia-Ming Chang
論文名稱(中文):	片語翻譯模型為本之雙語名詞片語擷取
論文名稱(外文):	Bilingual Noun Phrase Extraction With Phrase-Based Translation Model
指導教授(中文):	張俊盛
指導教授(外文):	Jason S. Chang
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	936305
出版年(民國):	95
畢業學年度:	94
語文別:	英文
論文頁數:	58
中文關鍵詞:	名詞片語、統計式機器翻譯、平行語料庫
外文關鍵詞:	noun phrase、statistical machine translation、parallel corpus
相關次數:	推薦:0 點閱:261 評分: 下載:11 收藏:0

在本論文中，我們提出一個從平行語料擷取名詞片語翻譯的新方法。我們的方法首先利用名詞片語辨識工具從原文句子擷取出所有可能的名詞片語。針對每一個名詞片語，我們利用現有的單字對應工具找到它在目標句的部分翻譯。接著，我們以部分翻譯為中心點，產生各種包含中心點的可能翻譯。最後，我們利用一個片語翻譯模型從中挑選出最有可能的翻譯。此片語翻譯模型包含兩個機率，分別是詞彙翻譯機率與孳生機率。詞彙翻譯機率用來計算單字間相關程度，而孳生機率則表示來源字翻譯後的字數長度機率。我們會在訓練階段分別利用EM演算法與一部機率辭典來訓練這兩組參數。我們實際撰寫了程式，以74萬句香港新聞為語料，與IBM Model4在名詞片語擷取的效能上進行比較。實驗的結果我們獲得了70%的準確率以及61%的召回率。實驗顯示我們的方法勝過IBM modle4，也說明了我們提出的新方法的確可以改善名詞片語翻譯擷取與機器翻譯中名詞片語的效率與品質。

We propose a new method for extracting noun phrase correspondence automatically from a sentence-aligned bilingual corpus. In our approach, noun phrases extracted from each source language sentence are aligned to phrases in each target language sentence based on a phrase translation model and maximum translation probability. The method involves generating word level alignment using existing word alignment technique as the basis of noun phrase alignment, and estimating Lexical Translation Probability (LTP) for noun phrases by using the EM algorithm and estimating Fertility Probability (FP) from a Most Frequency Translation Equivalent (MFTE). At runtime, for each noun phrase in the source sentence, partial translation in the target sentence is located. Then, each of the n-grams containing the partial translation is evaluated using phrase translation probability. The n-gram with maximum translation probability is chosen as the output. We describe the implementation of the method using bilingual Hong Kong news corpus. The experimental results show that our model outperforms IBM model4 in terms of precision rate of noun phrase extraction. The methodology cleanly improves the performance of noun phrase translation, which has been shown to be very crucial for statistical machine translation.

摘要
ABSTRACT
致謝辭
Table of Contents
List of Tables
List of Figures
Chapter 1 Introduction
Chapter 2 Related Work
Chapter 3 Phrase Translation Model
3.1 Problem Statement
3.2 Phrase-Based Translation Model
3.3 Training the Phrase Translation Model
3.3.1 Data Handling for Training Data
3.3.2 Estimate Lexical Translation Probability
3.3.3 Estimate Fertility Probability according to a MFTE
3.3.4 Estimate Null Probability
3.4 Runtime Noun Phrase Correspondence Extraction
Chapter 4 Experiments and Analysis
4.1 Training the Phrase Translation Model
4.2 Test data and Evaluation
4.2.1 Evaluation for locating pivot
4.2.2 Evaluation for extracting noun phrase correspondence
Chapter 5 Future Work and Conclusion
Rerferences
Appendix A - Test Set for Phrase Correspondence Extraction

Brown, Peter F.; Cocke, John; Della Pietra, Stephen A.; Della Pietra, Vincent J.; Jelinek, Frederick; Lafferty, John D.; Mercer, Robert L. and Roossin, Paul S.: 1990, `A statistical approach to machine translation`, in Computational Linguistics, volume 16(2): 79–85.

Yunbo Cao and Hang Li: 2002, `Base Noun Phrase Translation Using Web Data and the EM Algorithm`, in Proceedings of COLING 2002, pp. 127-133.

Catizone, R., G.Russell, and S. Warwick: 1989, `Deriving translation data from bilingual texts`, in Proceedings of the First International Lexical Acquisition Workshop, Detroit, USA.

David Chiang: 2005, `A Hierarchical Phrase-Based Model for Statistical Machine Translation`, in Proceedings of ACL-2005, pp. 263–270.

A. P. Dempster, N. M. Laird, and D. B. Rubin: 1977, `Maximum likelihood from incomplete data via the EM algorithm`, Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38.

Dong-Hui Feng, Ya-Juan Lv, Ming Zhou: 2004, "A New Approach for English-Chinese Named Entity Alignment,” in Proceedings of the Conference on EMNLP.

Gale, W., and K. Church: 1991, `Identifying word correspondences in parallel texts`, in Proceeding of Speech and Natural Language Workshop, pp. 152–157

W.John Hutchins: 1995, `Machine translation: A brief history`, in E.F.K. Koerner and R.E. Asher, editors, Concise history of the language sciences: from the Sumerians to the cognitivists, pages 431-445. Pergamon Press, Oxford, 1995.

Fei Huang, Stephan Vogel and A. Waibel: 2003, `Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-feature Cost Minimization`, in Proceedings of ACL2003 Workshop, pp. 9-16.

Kenji Imamura: 2002, `Application of translation knowledge acquired by hierarchical phrase alignment for pattern-based MT`, in Proceedings of TMI-2002, pp. 74–84.

Jian, J.-Y., Chang, Y.-C., and Chang, J.-S: 2004, `Collocational Translation Memory Extraction Based on Statistical and Linguistic Information.`, in ROCLING XV (ROCLING 2004)I, Taipei, Taiwan

Hiroyuki Kaji, Y. Kida, and Y. Morimoto: 1992, `Learning Translation Templates from Bilingual Text`, in Proceedings of COLING 1992, volume 2, pp. 672–678.

Koehn, P., and K. Knight: 2003, `Feature-rich Statistical Translation of Noun Phrases`, in Proceedings of ACL-2003, pp. 311–318.

Koehn, P., F. J. Och, and D. Marcu: 2003, `Statistical Phrase-Based Translation`, in Proceedings of HLT/NAACL-2003, pp.127–133.

Kumano, A., and H. Hirakawa: 1994, `Building an MT dictionary from parallel texts based on linguistic and statistical information`, in Proceedings of COLING 1994, pp. 76–81.

Kupiec, J.,: 1993, `An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora`, in Proceedings of ACL-1993, pp. 23–30

Chun-Jen Lee, Jason S. Chang, Jyh-Shing Roger Jang: 2005, `Named Entity Alignment: An Approach of Combining Statistical Models and Knowledge Information`, A Thesis Presented to the National Tsing Hua University for the Degree Doctor of Computer Science, pp. 1–128.

Marcu, D., and W. Wong: 2002, `A Phrase-Based, Joint Probability Model for Statistical Machine Translation`, in Proceedings of EMNLP-2002, pp.133–139.

Melamed, I. D.,: 1995, `Automatic evaluation and uniform filter cascades for inducing N-best translation lexicons`, in Proceedings of the Third Workshop on Very Large Corpora, pp. 184–198.

Meyers, Adam, Michiko Kosaka, and Ralph Grishman: 2000, `Chart-based translation rule application in machine translation`, in Proceedings of COLING-2000, pp. 537–543.

Moore, R. C.,: 2001, `Towards a simple and accurate statistical approach to learning translational relationships among words`, in Proceedings of ACL-2001, pp. 79–86.

Och, F. J., and H. Ney: 2000, `A Comparison of Alignment Models for Statistical Machine Translation`, in Proceedings of COLING 2000, pp. 1086–1090

Och, F. J., C. Tillmann, and H. Ney: 1999: `Improved alignment models for statistical machine translation`, in Proceedings of EMNLP-WVLC 1999, pp. 20–28

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu: 2002, `BLEU: a method for automatic evaluation of machine translation`, in Proceedings of ACL-2002, pp. 311–318

Wu, D., and X. Xia: 1994, `Learning an English-Chinese lexicon from a parallel corpus`, in Proceedings of AMTA-94, pp. 206–213

Yamada, K., and K. Knight: 2001, `A syntax-based statistical translation model`, in Proceedings of ACL-2001, pp. 523–530.

封面
摘要
致謝辭
目錄
第一章
第二章
第三章
第四章
第五章
參考文獻
附錄

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文