Comparison of TOCFL and HSK Wordlists and Suggestions on Word-selection



General service list is at the core of language learning, and has received the attention of language learners, teachers and even textbook writers. In the field of Chinese learning, the TOCFL wordlist and HSK wordlist are the most well-known wordlists. However, few studies tried to compare these two important wordlists. In this study, we compared the TOCFL and HSK wordlists and found (1) There are 3,700 words shared by both the TOCFL and HSK wordlists while there are 3,619 words in TOCFL different from HSK. In contrast, HSK has only 1,296 words that differ from TOCFL. (2) The TOCFL and HSK wordlists are laid out differently. (3) The HSK contains more than 4.3 times the number of four-character words than TOCFL does. (4) The TOCFL wordlist contains a larger number of ”er” words than the HSK does. Furthermore, we noticed that some low-frequency words were included in these two wordlists. With the help of two large corpora based on web pages of Mainland China and Taiwan, two top 10,000 high-frequency wordlists were compiled. These two wordlists were then compared with TOCFL and HSK wordlists separately. Based on the results of the comparison, we found that 4,770 high-frequency words from the TaiwanWac corpus were not included in the TOCFL wordlist. In addition, 6,357 high-frequency words from the Tenten corpus were not included in the HSK wordlist. To further improve the quality and coverage of TOCFL and HSK wordlists, it is suggested that some of these high-frequency words should be included in the two wordlists. The findings and recommendations of this study can be a useful reference for Chinese learners, teachers, textbook writers, and language test agencies.


