透過您的圖書館登入
IP:3.16.66.206
  • 學位論文

Extraction of Bilingual Multiword Expressions with Application to Bilingual Concordancer

指導教授 : 張俊盛 陳克健

摘要


Bilingual concordancer 是一種建構在平行語料庫上的電腦輔助翻譯工具。當使用者輸入一個單字或片語時,bilingual concordancer從平行語料庫中抽出包含該單字或片語的句子。接著,在對譯的句子中標出對等翻譯出現的位置,以及依照翻譯相關性重新排列句子。這樣的輸出結果不僅讓使用者可以習得對等的翻譯,同時也可以從句子中研究或學習該單字或片語翻譯的使用方法。因此,對於詞典的編輯者、專業的翻譯者、或是第二語言學習者來說,bilingual concordancer 都是非常實用的工具。 多詞表達(multi-word expression)的對等翻譯抽取技術則是 bilingual concordancer 中最重要的技術。例如對等翻譯標示 (highlighting translation equivalents) 及產生對等翻譯表(translation equivalents list) 都需要依賴高品質的對等翻譯抽取技術。然而到目前為止,對等翻譯的抽取技術仍有許多改進的空間。 在本論文中,我們將探討現有多詞表達對等翻譯抽取的一些問題,包括過度對應 (over-alignment) 的問題,以及不足對應 (under-alignment) 的問題。我們將提出一個全新的對等翻譯抽取模型來解決這些問題,以提高翻譯的品質。同時,我們以所提出的模型,實際建構了一個 bilingual concordancer電腦輔助翻譯系統。為了測試系統的品質,我們以三組不同型態的多詞表達做為測試資料,來測試 bilingual concordancer ,並以現有的統計式翻譯模型做為比較的對像。

並列摘要


A bilingual concordancer is a computer-assisted translation tool that uses the parallel corpus as its knowledge base. Given a word or phrase, the bilingual concordancer retrieves aligned sentence pairs, which contain the word or phrase in the source sentences, from the parallel corpus. Then, it identifies the translation equivalents in the target sentences and reorders the sentence pairs according to the correlation from the query string and the translation equivalents. It helps not only on finding translation equivalents of the query but also presenting various contexts of occurrence. As a result, it is extremely useful for bilingual lexicographers, human translators and second language learners. Extraction of bilingual multi-word expressions is the most important part of a bilingual concordancer. For example, highlighting translation equivalents in the target sentence and generating translation equivalent list are highly depend on a high quality extraction model. However, the existing models for extracting translation equivalents still have many problems and still room to improve. In this thesis, we discuss some problems of the existing models for extracting bilingual multi-word expressions, including the over-alignment problem and the under-alignment problem. Then, we propose a novel model to address these problems to improve the quality the extracted translation equivalents. Further, we implement a bilingual concordancer employs the proposed translation extraction model. To measure the performance of the bilingual concordancer, we use three type of multi-word expression as our test target. The results are compared with the existing statistical machine translation models.

參考文獻


[6] Bai, Ming-Hong, Keh-Jiann Chen and Jason S. Chang. 2006. Sense Extraction and Disambiguation for Chinese Words from Bilingual Terminology Bank. Computational Linguistics and Chinese Language Processing, 11(3):223-244.
[18] Chen, Keh-Jiann, Ming-Hong Bai. 1998. Unknown Word Detection for Chinese by a Corpus-based Learning Method. International Journal of Computational linguistics and Chinese Language Processing. 3(1): 27-44.
[36] Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In ACL’07, demonstration session.
[9] Bach, Nguyen, Matthias Eck, Paisarn Charoenpornsawat, Thilo Kohler, Sebastian Stuker, ThuyLinh Nguyen, Roger Hsiao, Alex Waibel, Stephan Vogel, Tanja Schultz, and Alan Black. The CMU TransTac 2007 Eyes-free and Hands-free Two-way Speech-to-Speech Translation System. In Proceedings of the IWSLT’07, Trento, Italy, 2007.
[42] Liou, Hsien-Chin, Jason S. Chang, Hao-Jan Chen, Chih-Cheng Lin, Meei-Ling Liaw, Zhao-Ming Gao, Jyh-Shing Roger Jang, Yuli Yeh, Thomas C. Chuang, Geeng-Neng You. 2006. Corpora Processing and Computational Scaffolding for a Web-based English Learning Environment: The Candle project. CALICO Journal, 24(1), 77–95.

延伸閱讀