Cross-Strait Lexical Differences: A Comparative Study based on Chinese Gigaword Corpus


近幾年來,由於兩岸交流頻繁,兩岸使用的詞彙,也因此互相影響甚重,語言學界對於漢語詞彙的研究,不論在語音、語義或語用上的探討,發現兩岸對使用相同漢語時的詞彙差異有著微妙性的區別。而兩岸卻又的確是使用漢字體系的書寫系統,只有字形上有可預測的規律性對應。本文在以兩岸皆使用中文文字的原則上,在繁體中文與簡體中文的使用狀況來比對兩岸使用詞彙的特性與現象,以探究與語義對應與演變等相關的議題。首先,在Hong和 Huang(2006)的對應上,藉以英文WordNet為比對標準,藉由比較北京大學的中文概念辭典(Chinese Concept Dictionary(CCD))與中央研究院語言所的中文詞網(Chinese Wordnet(CWN))兩個WordNet中文版所使用的詞彙,探討兩岸對於相同概念詞彙的使用狀況。本文進一步使用中文概念辭典與中文詞網所使用的詞彙,在Gigaword Corpus中繁體語料與簡體語料的相對使用率,探究兩岸對於使用相同詞彙,或使用不同詞彙的現象與分佈情形,並以Google網頁中所搜尋到的繁體資料與簡體資料進行比對、驗證。


兩岸詞彙 詞義 概念


Studies of cross-strait lexical differences in the use of Mandarin Chinese reveal that a divergence has become increasingly evident. This divergence is apparent in phonological, semantic, and pragmatic analyses and has become an obstacle to knowledge-sharing and information exchange. Given the wide range of divergences, it seems that Chinese character forms offer the most reliable regular mapping between cross-strait usage contrasts. In this study, we take general cross-strait lexical wordforms to discovery of cross-strait lexical differences and explore their contrasts and variations.Based on Hong and Huang (2006), we discuss the same conceptual words between cross-strait usages by WordNet, Chinese Concept Dictionary (CCD) and Chinese Wordnet (CWN). In this study, we take all words which appear in CCD and CWN to check their lexical contrasts of traditional Chinese character data and simplified Chinese character data in Gigaword Corpus, explore their appearances and distributions, and compare and demonstrate them via Google website.


