Have library access?
  • Journals


Analyzing the Features of the High-Frequency Words on Chinese Spoken Corpus and Offering the Word-recruiting Suggestion to TOCFL Wordlist



Parallel abstracts

Wordlist serves as a reference for second language teaching, and also guides second language learners to evaluate what words they need to acquire. The TOCFL wordlist is one of the common learning materials for learners preparing for Chinese proficiency test. However, words included in the TOCFL wordlist were largely selected from a written corpus, whereas words extracted from a spoken corpus were limited. Because written and spoken corpora are presumably different, it is necessary to include words in both registers and to emphasize the differences while teaching. To balance the proportions of written and spoken words in the TOCFL wordlists, this study first established a native spoken corpus by extracting subtitles from Mandarin movies and TV series, and then compiled a list of high-frequency spoken words as an amendment to the TOCFL wordlist. Comparison between this spoken wordlist with the TOCFL wordlist showed that the most frequently used 713 words in the corpus were not covered in the TOCFL wordlist. We then suggested a list of the top 238 high-frequency words to be included in the TOCFL wordlist. The 713 high-frequency spoken words were further classified into six groups based on their features, and some key findings were summarized as follows: (1) the majority of the items are word chunks, (2) the spoken words are characterized as multi-syllable words, and (3) there are large numbers of word combinations of bu and mei in the list. We hope that the provision of this commonly used spoken wordlist can increase the proportion of spoken words in the TOCFL wordlist, which can offer learners more authentic materials to meet their oral communication needs.


Berber-Sardinha, T.(2000).Comparing corpora with WordSmith Tools: How large must the reference corpus be?.Proceedings of the workshop on Comparing corpora-Volume 9.(Proceedings of the workshop on Comparing corpora-Volume 9).
Biber, Douglas(1988).Variation Across Speech and Writing.Cambridge:Cambridge University Press.
Biber, Douglas,Finegan, Edward(1991).On the Exploitation of Computerized Corpora in Variation Studies.English Corpus Linguistics.(English Corpus Linguistics).:
Biber, Douglas,Conrad, Susan,Cortes, Viviana(2004).If you look at...: Lexical bundles in university teaching and textbooks.Applied linguistics.25(3),371-405.
Carter, Ronald(2004).Language and Creativity: The Art of Common Talk.London:Routledge.
