  • 期刊


An Online Chinese-English Translation Retrieval System for Near Synonymous Sentences



翻譯的過程中最常面臨的問題是如何翻譯一個不熟悉的詞組或句子。本文探討如何利用中英平行語料庫來建構線上電腦輔助翻譯工具。使用此系統時,以瀏覽器連至系統網頁並輸入所要查詢的中文或英文的詞組或句子,系統會將平行語料庫中的中英對應句例子按照與輸入詞彙所包含的共同詞彙數目與重要性排序顯示給使用者參考。此系統的流程是先利用統計與機讀中英雙語辭典從平行語料庫自動取得對應句,接著以分詞程式處理中文文件並以程式紀錄每一個中英文詞出現的檔案及位置(即建立索引檔)。句子的相似性判斷是透過詞的權值,這可以以資訊檢索常用的公式TF (Term Frequency) * IDF (Inverse Document Frequency)得到。其基本假設是一個詞的重要性與它在某一篇文章出現的次數成正比,但與它在所有文件中出現的比例成反比。系統根據由程式自動建立好的索引找出任何包含與輸入句子有共同詞彙的句子,然後計算所有共同詞匯的TF * IDF權值的總和,並據此來排序相關的句子與其翻譯。我們將系統雛形提供給翻譯課的學生使用,初步實驗的結果顯示這樣的系統對翻譯的教與學有相當的助益。


Searching for an appropriate expression in a foreign language is often like looking for a needle in a hay. It is time-consuming, laborious, and often turns out to be in vain. With the recent development of corpus-based computational linguistics, a new approach to tackling this thorny problem has emerged. The approach draws on a bilingual concordance, a tool that can retrieve examples and their translations from large bilingual text databases (i.e. corpora) by accepting keywords in both the source and target language. The tool greatly facilitates the retrieval of unfamiliar expressions. With it, learners and translators can learn how to express themselves in a foreign language by inputting an expression in his/her native language and inspecting the translation examples. The potential help of a bilingual concordancer to language learners and translators is thus enormous. Unfortunately, the technical difficulties involved in finding sentence correspondences (i.e. sentence alignment) in bilingual texts make bilingual concordancers difficult too implement, which explains why they are not generally available to learners and translators in Taiwan. In this paper, we elaborate on how to develop an online tool for computer-aided translation. We introduce how sentence correspondences in a bilingual corpus can be established using electronic dictionaries and statistical algorithms. We further discuss the procedures of constructing a web-based Chinese-English translation retrieval system in light of a sentence-aligned bilingual corpus. The system is more powerful and sophisticated than a bilingual concordancer. It can take a keyword, phrase, or sentence in the source or target language as input and retrieve the closest translations if no translation equivalents of the input expression are found. Central to this technique is the formulation of a measure for semantic similarity based on the calculation of term weighting of the input query. The system employs TF * IDF (term frequency * inverse document frequency) for calculating the weighting of each word after it performs word segmentation for input queries. Using the proposed term weighting method, it is capable of retrieving and ranking synonymous or conceptually similar sentences along with their translations. Limitations of this approach and directions for future improvement are discussed in the paper.




Survey of the State of Art in Human Language Technology
Brown, P.(1991).Aligning Sentences in Parallel Corpora.Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics.(Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics).:
Dagan, I.,Church, W,Gale, W.(1993).Robust Bilingual Word Alignment for Machine Aided Translation.(Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives).
Fung, P.,Church, K.(1994).K-vec: A New Approach for Aligning Parallel Texts.Proceedings of the International Conference of Computational Linguistics.(Proceedings of the International Conference of Computational Linguistics).:
Fung, P.,KcKeown, K.(1997).A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora Across Language Groups.Machine Translation.12(1),53-87.


Yang, P. C. (2018). 以電腦技術自動採用翻譯資源 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU201800510
