Title

半自動擷取中文動補式複合動詞及其英文翻譯

Translated Titles

Semi-Automatic Identification of Chinese Resultative Verb Compounds and Their English Translation Equivalents

DOI

10.6342/NTU202000527

Authors

李小慧

Key Words

動補式複合動詞 ; 語料庫翻譯研究 ; 半監督機器學習 ; 條件隨機域 ; 機器翻譯 ; RVC ; resultative verb compound ; corpus-based translation studies ; semi-supervised machine learning ; conditional random fields ; machine translation

PublicationName

臺灣大學翻譯碩士學位學程 學位論文

Volume or Term/Year and Month of Publication

2020年

Academic Degree Category

碩士

Advisor

高照明

Content Language

英文

Chinese Abstract

中譯英受到中英文句子結構和語法的巨大差異而變得複雜,其中一個難點是由動詞和結果補語所形成的複合動詞,又稱為動補式複合動詞(RVC)。RVC大部分是兩個字的組合,其中第一個字表示某種動作或方式的動詞,第二個字則表示結果、方向、或程度(例如,吵醒, 跌下,讀熟等)。關於中文動補式複合動詞(RVC)的形成歷史或在現代漢語語法中的功能,過去已有大量的研究,但是在語料庫語言學或翻譯研究領域中,很少有相關的研究。 本研究著重於RVC的辨識和翻譯。我們選擇中國大陸作家姜戎所著的小說《狼圖騰》和漢學家Howard Goldblatt所翻譯的英文本Wolf Totem作為語料。中文原著和英文翻譯以人工方式進行段落對齊,成為一個平行語料庫,並以原著的前18章進行辨識RVC的實驗,我們採用了半監督機器學習方法,以及CRF++套件。我們首先以人工方式擷取原著第一章中所有的動補式複合動詞(RVC)標記作為CRF++ 套件中的種子,然後將前18章中將某些關鍵特徵(包括詞性,單詞中的字符位置)附加到每個字符上,以創建訓練文件。我們發現相對NLPIR詞性標記系統中的“主類”,NLPIR的“次類”標記有較高的正確率。在辨識RVC後,我們創建了一個程式界面,並利用多語詞對應程式Anymalign自動找到這些RVC的英文翻譯,雖然由於語料屬於文學性質,該程式無法找到許多的RVC,但是程式可以讓翻譯研究者和譯者從已經段落對齊的中英平行語料找到動補式複合動詞(RVC)在不同語境下的各種不同的翻譯。

English Abstract

Drastic differences in sentence structure and grammar complicate Chinese to English translation, with one particularly inconspicuous grammatical feature of Mandarin Chinese significantly hindering an accurate English translation: The Resultative Verb Compound or RVC. An RVC is a combination of characters (often in pairs, but not always) in which the first character constitutes some action or manner verb, and the second some result, direction, or extent (e.g., 打斷, 坐下, 讀熟, etc.). Vast amount of research on RVCs with respect to the history of its formation or its function in modern Chinese grammar has been conducted, but little to no serious research has been carried out on RVCs in the field of corpus linguistics or translation studies. This study is thus focused on the identification and subsequent translation of RVCs based on the Chinese novel《狼圖騰》by Jiang Rong and its English translation Wolf Totem by Howard Goldblatt, which was manually aligned by us at the paragraph level to form a parallel corpus. To identify RVCs within the first 18 chapters of a novel, we adopted a method of semi-supervised machine learning with the use of the CRF++ toolkit. By first manually tagging RVCs in the first chapter of the text to act as seeds and then affixing certain key features – NLPIR and NAER part-of-speech tags, the corresponding B, I, and E tags for character positioning in the beginning, middle, and endings of a word, and the RVC seeds – to each character in the first 18 chapters to create a training file, we were able to generate thousands of predicted RVCs in two separate experiments (Experiment 1 used NLPIR “parent” tags, and Experiment 2 used NLPIR “child” tags). We discovered the NLPIR “child” tags produced more accurate results when compared to the NLPIR “parent” tags. Upon identifying the RVCs, we created an interface to find the English translations of the RVCs using the program Anymalign, which is a multilingual word aligner. Though the program was unable to find many of the RVCs due to their low frequency and the literary nature of the text provided, the interface program allows translation researchers and working translators to manually identify translation equivalents of Mandarin Chinese RVCs and study the different translations based on the previously-aligned parallel corpus.

Topic Category 人文學 > 語言學
文學院 > 翻譯碩士學位學程
Reference
  1. Anthony, L. (2017). AntPConc (Version 1.2.1) [Computer Software]. Tokyo, Japan: Waseda University. Available from https://www.laurenceanthony.net/software
  2. Baker, M. (1993). Corpus Linguistics and Translation Studies: Implications and Applications. In Mona Baker, Gill Francis & Elena Tognini-Bonelli (Eds.) Text and Technology: In Honour of John Sinclair (pp. 233–50). Amsterdam: John Benjamins B.V.
  3. Bishop, C. (2006). Pattern Recognition and Machine Learning. New York: Springer Publishing.
  4. Chao, Y.R. (1968). A Grammar of Spoken Chinese. Berkeley: University of California Press.
  5. Cook, A. (2014). A Linguistic Analysis of Selected Morpho-syntactic Features of Spoken Mandarin (Doctoral dissertation). Griffith University.
  6. Deng, X. J. (2010). The Acquisition of the RVC in Mandarin Chinese (Master’s Dissertation). Retrieved from http://www.cuhk.edu.hk/lin/new/people/students/dengxiangjun/doc/DengXiangjun2010_thesis.pdf
  7. EMT Expert Group (2009). Competences for professional translators, experts in multilingual and multimedia communication. Retrieved from http://ec.europa.eu/dgs/translation/programmes/emt/key_documents/emt_competences_translators_en.pdf
  8. Gao, Z. M. (2011) Exploring the effects and use of a Chinese-English Bilingual Concordancer. Computer-Assisted Language Learning, 24 (3), 255-275.
  9. Grover, K. (2014). V1-le vs. RVC-le in expressing resultant state in learners’ Mandarin interlanguage: evidence of two states of mind? LSA Annual Meeting Extended Abstracts.
  10. House, J. (2014). Translation Quality Assessment: Past and Present. In Juliane House (Eds.), Translation: A Multidisciplinary Approach (pp. 241-264 of ch. 13). London: Palgrave Macmillan.
  11. Jiang, R. (2004). 狼圖騰 [Wolf Totem]. Wuchang: Changjiang Arts Publishing House.
  12. Jiang, R. (2008). Wolf Totem (H. Goldblatt Trans.). New York: Penguin Press. (Original work published in 2004).
  13. Kawaguchi, Y., Takagaki, T. Tomimori, N, & Tsuruga, Yoichiro (Eds.) (2007). Corpus-based Perspectives in Linguistics. John Benjamins Publishing.
  14. Ke, Z. (1995). The Syntax of the Chinese BA-Constructions and Verb Compounds: a Morpho-Syntactic Analysis (Doctoral dissertation). University of Southern California, Los Angeles. Retrieved from http://digitallibrary.usc.edu/cdm/ref/collection/p15799coll17/id/481164
  15. Kudo, T. (2003). CRF++ [Computer Program]. Retrieved from https://taku910.github.io/crfpp/#tips
  16. Lafferty, J., McCallum, A, and Pereira, F. C. N. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the 18th International Conference on Machine Learning 2001 (pp. 282-289).
  17. Lardilleux, A. and Lepage, Y. (2009). Sampling-based multilingual alignment. In International Conference on Recent Advances in Natural Language Processing RANLP-2009 (pp. 214-218).
  18. Li, C. N. and Thompson, S. A. (1981). Mandarin Chinese: A Functional Reference Grammar. Los Angeles: University of California Press.
  19. Li, W. S. (2008). The First Language Influence on the Second Language Acquisition of Mandarin Resultative Verb Compounds (Master’s Dissertation). Retrieved from National Digital Library of Theses and Dissertations in Taiwan.
  20. Lü, S. X. (1955). The Papers on the Chinese Grammar. Beijing: Science Press.
  21. Manning, C. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge: MIT Press.
  22. McEnery, T. (2003). Corpus linguistics. In Ruslan Mitkov (Ed.) Oxford Handbook of Computational Linguistics (pp. 448-463). Oxford: Oxford University Press.
  23. Nida, E. A. (2001). Dynamic Equivalence in Translating. In Chan Sin-Wai & David E. Pollard (Eds.) An Encyclopaedia of Translation (pp. 223-230). Hong Kong: The Chinese University of Hong Kong.
  24. OpenCC [Computer software]. (2013). Retrieved from https://github.com/BYVoid/OpenCC
  25. Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. In Journal of Machine Learning Technologies, 2 (1), 37–63.
  26. Pym, A. (2013). Translation Skill-Sets in a Machine-Translation Age. In Translators' Journal, 58 (3), 487-503.
  27. Riesa, J., Irvine, A. and Marcu, D. (2011). Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation. In Proceedings of EMNLP, pp. 497-507.
  28. Rojo, A. (2013). Review of Michael P. Oakes and Meng Ji (Eds.) Quantitative Methods in Corpus-Based Translation Studies: A Practical Guide to Descriptive Translation Research. In Journal of Research Design and Statistics in Linguistics and Communication Science 1 (1).
  29. Roten, T. (2009). PyNLPIR Documentation.
  30. Roturier, J. (2015). Localizing Apps: A Practical Guide for Translators and Translation Students. New York: Routledge.
  31. Sun, C. F. (2013). Chinese Resultative Verb Compounds: Lexicalization and Grammaticalization. In Breaking Down the Barriers, pp. 625-649.
  32. Tai, H.Y. (2003). On the Equivalent of ‘kill’ in Mandarin Chinese. In Journal of the Chinese Language Teachers Association, 10 (2), 48-52.
  33. Tai, H.Y. (1975).
  34. Tang, T. C. (1989). 漢語詞法與兒童語言習得I:漢語動詞 [Chinese
  35. morphology and child language acquisition, I: Verbs in Chinese]. In Studies in Chinese Morphology and Syntax, 43-92. Taipei: Student Books.
  36. Thompson, S. (1973). Resultative Verb Compounds in Mandarin Chinese: A Case for Lexical Rules. In Language, 49 (2), 361-379. Linguistic Society of America.
  37. Wang, L. (1954). Chinese Modern Grammar. Beijing: Zhonghua Press.
  38. Yong, S. (1997). The grammatical functions of verb complements in Mandarin
  39. Chinese. In Linguistics 35(1), 1-24.
  40. Zhang, K. (2018). Natural Language Processing and Information Retrieval System Platform [Computer Program]. Retrieved from http://ictclas.nlpir.org/index_e.html