近年來人工智慧技術的發展日益成熟,使得神經機器翻譯系統的譯文品質大幅進步,不論在可讀性或理解性上都有顯著提升。然而,儘管神經機器翻譯系統已經能夠處理統計機器翻譯系統過去無法處理的詞彙與句法方面問題,但神經機器翻譯系統仍有侷限性,需要透過譯前編輯或譯後編輯人為干預來提高翻譯品質。有鑒於學術論文寫作的特色在於文本相對固定,句型上也有一定的規則可循,被認為適合使用機器翻譯進行輔助翻譯。因此,本研究旨在透過分析台灣碩博士論文網站點閱率前50篇具列點式特色的教育領域中文摘要Google Translate (GT) 英文譯文,統整出GT譯文中最常見的錯誤,並檢驗不同的譯前編輯策略對於提高GT譯文品質的有效性。透過文本分析調查發現,教育領域中文學術文本多採用列點式陳述,並經常採用長句以多個分號或列舉串連的撰寫方式。再加上中英文之間用語、句子結構與研究術語等的差異,導致GT譯文產生字詞、句構和其他層面的錯誤類別,每種錯誤類別在一定程度上均影響可讀性和譯文品質。研究結果顯示,12種能夠有效解決錯誤之譯前編輯策略可分為字詞、句構與其他層面。其中,縮略詞改寫為完整字詞、長句分段成短句、改寫為被動句,以及列點代入阿拉伯數字、英文字母或羅馬數字等譯前編輯策略,有助於減少機器翻譯系統處理學術文本時的錯誤,在可讀性及可理解性上皆有顯著提升。本研究的結果可以促進學生或研究人士理解如何使用機器翻譯,並有效控制學術文本中文源語言以產出符合標準的英文譯文。
The development of artificial intelligence technology has become increasingly sophisticated, resulting in significant improvements in both readability and comprehensibility of neural machine translation (NMT) systems. However, even though NMT systems have been able to handle lexical and syntactic aspects of translation that statistical machine translation (SMT) systems could not handle in the past, NMT systems still have limitations and require human intervention in improving translation quality through pre- or post-editing. Given that academic writing features relatively fixed rules for sentence patterns, the use of machine translation (MT) to assist with the translation of academic texts is considered appropriate. Therefore, this study is to investigate the effectiveness of pre-editing techniques on the quality improvement of Google Translate (GT) by analyzing the GT English translations of 50 Chinese academic abstracts in the education (ED) field extracted from the website of National Digital Library of Theses and Dissertations (NDLTD) in Taiwan, sorting out the most common errors in the GT translations and experimenting with different pre-editing techniques. Through the text analysis, Chinese academic texts in the ED field mostly feature bullet-listed format and frequent use of long sentences linked by multiple semicolons or lists of enumerations. This, along with the differences in word usage, sentence structure and research terminology between Chinese and English, causes GT output to result in error types at lexico-syntactic and other levels, with each error type affecting the readability and quality of translation to a certain extent. The research finds that 12 pre-editing techniques that are effective in addressing these GT errors can be categorized into lexical, syntactic and other levels. Among them, the application of pre-editing techniques, such as change of abbreviations into full names, segmentation of a long sentence into more than one linguistic unit, change to passive voice and insertion of bulleted lists in Arabic, alphabet or Roman numerals, can reduce errors in the processing of academic texts by NMT systems and show a significant improvement in MT output quality, which may in turn provide a better understanding of how to efficiently control the use of the Chinese source language in academic texts before using MT for English output.