透過您的圖書館登入
IP:18.234.232.228
  • 期刊

分析華語口語語料庫高頻詞之特點並對TOCFL詞表提出建議

Analyzing the Features of the High-Frequency Words on Chinese Spoken Corpus and Offering the Word-recruiting Suggestion to TOCFL Wordlist

摘要


詞表主要功能在於列出學習者應學習的詞彙,能作為第二語言詞彙教學之參考。臺灣著名的華語詞表-TOCFL詞表為測驗及學習者常用詞表。但因TOCFL詞表選詞以參考書面語語料庫詞頻為主,口語語料所佔的比例較低。然而書面語與口語詞彙為不同語體詞彙,是故語言教學不應僅限於書面語,選詞亦應涵蓋口語詞彙。為探究是否部分口語高頻詞可建議納入TOCFL詞表,以使詞表書面語語料及口語語料的比例較為平均,本研究蒐集具有對話性質的華語連續劇及電影對白作為母語者口語語料,藉詞頻排列得出口語高頻詞彙,並與TOCFL詞表進行對比。對比後發現有713筆口語語料高頻詞並未收錄於TOCFL詞表中。本研究提出詞頻最高且最具口語關鍵詞特點之238筆詞彙供TOCFL詞表增修參考詞彙,亦依據自前人文獻整理出的6項口語詞彙特點針對713筆口語高頻詞進行分類並歸納出以下特點:(1)口語詞彙多以詞塊合成詞的形式呈現;(2)口語詞彙中包含較多多音節熟語;(3)以「不」及「沒」的組合型式詞塊數量偏多。本研究最後將依詞頻及口語詞彙特色提出建議可納入TOCFL詞表之詞彙,期望提高詞表中口語詞彙以增加TOCFL詞表的豐富性及多元性,並提供華語教學更貼近口語交際使用之詞彙參考。

並列摘要


Wordlist serves as a reference for second language teaching, and also guides second language learners to evaluate what words they need to acquire. The TOCFL wordlist is one of the common learning materials for learners preparing for Chinese proficiency test. However, words included in the TOCFL wordlist were largely selected from a written corpus, whereas words extracted from a spoken corpus were limited. Because written and spoken corpora are presumably different, it is necessary to include words in both registers and to emphasize the differences while teaching. To balance the proportions of written and spoken words in the TOCFL wordlists, this study first established a native spoken corpus by extracting subtitles from Mandarin movies and TV series, and then compiled a list of high-frequency spoken words as an amendment to the TOCFL wordlist. Comparison between this spoken wordlist with the TOCFL wordlist showed that the most frequently used 713 words in the corpus were not covered in the TOCFL wordlist. We then suggested a list of the top 238 high-frequency words to be included in the TOCFL wordlist. The 713 high-frequency spoken words were further classified into six groups based on their features, and some key findings were summarized as follows: (1) the majority of the items are word chunks, (2) the spoken words are characterized as multi-syllable words, and (3) there are large numbers of word combinations of bu and mei in the list. We hope that the provision of this commonly used spoken wordlist can increase the proportion of spoken words in the TOCFL wordlist, which can offer learners more authentic materials to meet their oral communication needs.

參考文獻


楊惠媚、陳浩然、潘依婷(2014)。兩岸華語詞表之比較及選詞建議。華語文教學研究。11(1),67-98。
Chen, Keh-Jiann,Bai, Ming-Hong(1998).Unknown Word Detection for Chinese by a Corpus-based Learning Method.International Journal of Computational linguistics and Chinese Language Processing.3(1),27-44.
(2004).Vocabulary in a Second Language: Selection, Acquisition, and Testing.
(1983).Language and communication.
Berber-Sardinha, T.(2000).Comparing corpora with WordSmith Tools: How large must the reference corpus be?.Proceedings of the workshop on Comparing corpora-Volume 9.(Proceedings of the workshop on Comparing corpora-Volume 9).

延伸閱讀