  • 期刊


Applying the Corpus of Contemporary Taiwanese Mandarin in Teaching Chinese as a Second Language


國家教育研究院臺灣華語文語料庫(Corpus of Contemporary Taiwanese Mandarin, COCT)語料包括書面語、口語、華英雙語及華語中介語。本文目的主要為應用華語文語料庫研發華語文漢字、詞語及語法點分級及研發語料庫整合應用系統。本文應用華語文語料庫語料的詞頻、覆蓋率、分布均勻度、類詞綴、語義場關聯詞、構詞率及組字力的統計分析結果,輔以學者專家和資深華語文教師諮詢,完成華語文漢字、詞語及語法點分級標準。此外,整合應用華語文分級標準成果及語料庫科技研發建置了「語料庫索引典系統」、「語義場關聯詞查詢系統」、「作文錯別字自動批改系統」及「例句編輯輔助系統」等系統。最後,本文並對未來華語文語料庫在通用詞頻表的建置、基礎詞彙表的建構、及華語文搭配詞結構分析等之研究,提出建議。


The main reason for the National Academy for Educational Research to construct the Corpus of Contemporary Taiwanese Mandarin (COCT) is to make sure a comprehensive applications for Teaching Chinese as a Second Language (TCSL). The COCT includes corpora taken from written language, spoken language, bilingual Chinese-English and Chinese learners' interlanguage. This paper aims to explore the application of the COCT in establishing difficulty levels of Chinese characters, words, and grammar for TCSL, and the development of corpus techniques in TCSL with standard system integration. After conducting statistical analyses of lexical frequency, coverage, distribution uniformity, affixes, semantic-field-related words, character and word formation rates from the COCT, as well as consulting with experts and senior TCSL teachers, the researchers have been able to establish a standard for the classification of Chinese characters, words, and grammatical patterns. Furthermore, a NAER concordance system, a Semantic-field-related word query system, a Writing typos automatic correction system and an Example sentences editing-assistance system were completed by integrating the standard system and corpus techniques. Finally, this paper puts forward some suggestions for the future use of the COCT in the construction of a common-word frequency table, a basic vocabulary table, and the analysis of the Chinese collocation structure.


Hsieh, Yu-ming, Ming-hong Bai, Shu-ling Huang, and Keh-jiann Chen. 2015. Correcting Chinese spelling errors with word lattice decoding. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 14.4: 18.
Huang, Chu-ren, Hua-rui Zhang, and Shi-wen Yu. 2005. On predicting and verifying a basic lexicon: Proposals inspired by distributional consistency. POLA Forever: Festschrift in Honor of Professor William SY. Wang on His 70th Birthday, eds. by Dah-an Ho, and Ovid J. L. Tzeng, 57-69. Taipei: Institute of Linguistics, Academia Sinica.
Juilland, Alphonse, and Eugenio Chang-Rodríguez. 1964. Frequency Dictionary of Spanish Words. The Hague: Mouton.
Kilgarriff, Adam, and David Tugwell. 2001. Word sketch: Extraction and display of significant collocations for lexicography. Proceedings of the Workshop “COLLOCATION: Computational Extraction, Analysis and Exploitation” 39th ACL & 10th EACL, 32-38. Toulouse, France.
Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst 2007. Moses: Open source toolkit for statistical machine translation. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessionsm, 171-180. Prague, Czech Republic.
