透過您的圖書館登入
IP:18.190.156.80
  • 學位論文

從平行語料庫編纂漢英法律雙語詞彙

Using Chinese-English Parallel Corpora for Compiling Bilingual Legal Glossaries

指導教授 : 高照明

摘要


建立雙語詞彙表能有助譯者掌握專門領域翻譯及維持一致性。本文希望提供有效的半自動方法擷取雙語詞組供譯者使用,並有系統的建立編纂雙語專業詞彙機制。首先,本文透過Anymalign及Pialign兩套自然語言處理軟體,從兩岸刑法平行語料庫中取得詞組對應機率較高的漢英雙語候選詞及詞組。鑑軟體的品質不盡完善,故本文著手改善擷取結果,首先處理英文部分,即利用詞性標記軟體自動標記英文語料找出名詞或名詞組。後透過語言規律,將較不符合中英文名詞組組成規律的候選詞組過濾。所得的有效臺灣法律漢英語詞組為1,852個,中國大陸3,782個,其中包含更長單位的漢英名詞組。基於前述結果篩選出漢英詞組單位對照正確的詞組:臺灣有694組,中國大陸852組。另外,本文採用美國的參考語料庫,設下關鍵性字比值LLR ≥ 3.84的門檻,從一般名詞組中篩選出術語。臺灣487個英文名詞�名詞組中,有394個適合為術語;中國大陸的517個名詞�名詞組中則只有418個合適。擷取所得的有效名詞�名詞組經進一步的處理後製作成可比語料,顯示兩岸在法律中文用語或英譯上的異同。 就擷取效果而言,英文的名詞組擷取遠較中文容易,因中文的名詞性僅能在句子中才能顯出。反觀英文單字的後綴明顯,擷取名詞組時較有效率。中文在包含或不包含「之」(臺灣),或含有「的」或「之」(中國大陸)的二大類名詞組中,由於漢語詞組結構及其英語翻譯並未見一定規律,使得成功雙語完整擷取的難度大幅提高。就篩選術語而言,隨着名詞組長度單位增加,包括術語與一般字的組合亦同時增加,造成術語篩選上的困難。另兩岸的刑法內涵不盡相同。擷取單位的不完整性容易產生偽異同詞組。 本文研究方法及結果,冀能啟迪其他語言的雙語名詞組擷取或套用於其他專門領域,有助推行編纂雙語術語字彙表及促進翻譯在地化。

並列摘要


Bilingual glossaries enable translators to maintain accurate domain-specific translation and consistency. This study aims to extract bilingual pairs semi-automatically and provides a systematic specialized term compilation for translators to follow. First, two natural language processing tools, Anymalign and Pialign, were adopted to extract Chinese-English candidate pairs that had higher translation probabilities from Taiwan’s and Mainland China’s criminal parallel corpora. As the extraction tools are not perfect, this study focuses on improving the quality of the preliminary extracted results: qualified English noun (phrases) were identified first. English words/phrases were assigned parts-of-speech labels by Stanford POS Tagger automatically. Linguistic information was helpful in identifying and removing non-noun (phrase) patterns in order to locate qualified English noun phrases (NP). 1,852 English NPs with their Chinese pairs were found to be qualified (Taiwan), while 3,782 NPs were found to be valid (Mainland China). These results include noun phrases with longer word units. Based on these candidate results, 694 bilingual pairs were identified to be correctly aligned (Taiwan), while 852 correctly aligned bilingual pairs were identified (Mainland China). An American reference corpus was utilized to mark terms out by the indication of Keyness scores. The threshold was set at the critical value ≥ 3.84 calculated by the log-likelihood ratio. 394 terms were found among 487 qualified noun (phrases) from Taiwan, while 418 terms were identified among 517 Mainland China’s result. Qualified noun (phrases) and terms were adopted to produce a comparable list of noun (phrases), showing the similarities and differences in Chinese legal expressions and English translation across the Straits. Improving the extraction of English NPs was confirmed to be much more effective than that of Chinese NPs because Chinese parts-of-speech can be made clearer only in sentences. In contrast, as suffixes of English words are more distinct, it is shown that English noun (phrase) extraction was more effective by having identified their suffixes according to the parts-of-speech. Based on the result, extraction of bilingual pairs was proven to be difficult because, first, no regular patterns could be identified in the two main Chinese NP groups: those NPs include/exclude之 zhi (in Taiwan’s NP), or的de/之zhi (Mainland China’s NP). Second, the English translation patterns corresponding to Chinese NPs were neither always predictable. Filtering terms was proven to be even more difficult. As the length of an NP unit increases, mixture of terms and common words within an NP will more likely appear. Legal connotations are not entirely the same in Taiwan’s and Mainland China’s legal systems. If the word units of certain phrases are not extracted completely, falsely similar or seemingly different bilingual pairs will be created. The methodology and findings presented in this study are recommended to be applicable to other language pairs and different domain-specific genres, which may facilitate improvements in the compilation of bilingual term glossaries and localization in translation.

參考文獻


Wang, L. (2008, April). Cultural transfer in legal translation: A case study of the translation of the common law into Chinese in Hong Kong (Unpublished doctoral dissertation). Department of Chinese, Translation, and Linguistics. City University of Hong Kong, Hong Kong.
黃國平,廖柏森 (2007). 術語翻譯之探討:以關務詞彙為例. Studies of Translation and Interpretation 10, 1-29. Taipei, Taiwan.
陳武天 (2010, July). 現代漢語「的」字結構的語法分析與教學應用 (Unpublished Master’s thesis). Department of Chinese as a Second Language, National Taiwan Normal University. Available from National Digital Library of Theses and Dissertations, Taiwan.
Anthony, L. (2014a). AntConc (Version 3.4.3) [Computer Software]. Tokyo, Japan: Waseda University. Retrieved May 7, 2015, from http://www.laurenceanthony.net/.
Wu, D. (1995, August). Stochastic inversion transduction grammars, with application to segmentation, bracketing, and alignment of parallel corpora. 14th International Joint Conference on Artificial Intelligence. Montreal.

延伸閱讀