Bilingual concordancer 是一種建構在平行語料庫上的電腦輔助翻譯工具。當使用者輸入一個單字或片語時,bilingual concordancer從平行語料庫中抽出包含該單字或片語的句子。接著,在對譯的句子中標出對等翻譯出現的位置,以及依照翻譯相關性重新排列句子。這樣的輸出結果不僅讓使用者可以習得對等的翻譯,同時也可以從句子中研究或學習該單字或片語翻譯的使用方法。因此,對於詞典的編輯者、專業的翻譯者、或是第二語言學習者來說,bilingual concordancer 都是非常實用的工具。 多詞表達(multi-word expression)的對等翻譯抽取技術則是 bilingual concordancer 中最重要的技術。例如對等翻譯標示 (highlighting translation equivalents) 及產生對等翻譯表(translation equivalents list) 都需要依賴高品質的對等翻譯抽取技術。然而到目前為止,對等翻譯的抽取技術仍有許多改進的空間。 在本論文中,我們將探討現有多詞表達對等翻譯抽取的一些問題,包括過度對應 (over-alignment) 的問題,以及不足對應 (under-alignment) 的問題。我們將提出一個全新的對等翻譯抽取模型來解決這些問題,以提高翻譯的品質。同時,我們以所提出的模型,實際建構了一個 bilingual concordancer電腦輔助翻譯系統。為了測試系統的品質,我們以三組不同型態的多詞表達做為測試資料,來測試 bilingual concordancer ,並以現有的統計式翻譯模型做為比較的對像。
A bilingual concordancer is a computer-assisted translation tool that uses the parallel corpus as its knowledge base. Given a word or phrase, the bilingual concordancer retrieves aligned sentence pairs, which contain the word or phrase in the source sentences, from the parallel corpus. Then, it identifies the translation equivalents in the target sentences and reorders the sentence pairs according to the correlation from the query string and the translation equivalents. It helps not only on finding translation equivalents of the query but also presenting various contexts of occurrence. As a result, it is extremely useful for bilingual lexicographers, human translators and second language learners. Extraction of bilingual multi-word expressions is the most important part of a bilingual concordancer. For example, highlighting translation equivalents in the target sentence and generating translation equivalent list are highly depend on a high quality extraction model. However, the existing models for extracting translation equivalents still have many problems and still room to improve. In this thesis, we discuss some problems of the existing models for extracting bilingual multi-word expressions, including the over-alignment problem and the under-alignment problem. Then, we propose a novel model to address these problems to improve the quality the extracted translation equivalents. Further, we implement a bilingual concordancer employs the proposed translation extraction model. To measure the performance of the bilingual concordancer, we use three type of multi-word expression as our test target. The results are compared with the existing statistical machine translation models.