A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Applications




Benjamin K. T'sou;Hing-Lung Lin;Godfrey Liu;Terence Chan;Jerome Hu;Ching-Hai Chew;John K. P. Tse

Key Words


Volume or Term/Year and Month of Publication

2卷1期(1997 / 02 / 01)

Page #

91 - 104

Content Language


English Abstract

Similar to other languages such as English, Spanish and Arabic, Chinese is used by a large number of speakers in distinct speech communities which, despite sharing the unity of language, vary in interesting ways, and a systematic study of such linguistic variation is invaluable to appreciate the diversity and richness of the underlying cultures. This paper describes Project LIVAC (Linguistic Variation in Chinese Communities), which focuses on the development of a Chinese corpus, based on data taken concurrently at regular intervals from multiple Chinese speech communities. The resulting database and computerized concordance from the approximately 20 million word corpus with uniform time reference points extending across two years enable linguists and social scientists to undertake meaningful qualitative and quantitative comparative analysis of the development of linguistic and cultural variation. To facilitate these studies, a framework for integrating the corpus with specific corpus analysis applications is proposed. Based on this framework, a prototype retrieval system, which supports longitudinal studies on word and concept distribution, as well as lexical and other linguistic variation, is designed and implemented.

Topic Category 人文學 > 圖書資訊學
基礎與應用科學 > 資訊科學
工程學 > 電機工程
  1. (1997).Project LIVAC.
  2. Chen, C. Y.(1984).New Papers on Chinese Lanugage Use.
  3. Chen, K. J.,Huang, C. R.,Chen, C. Y.,Tseng, S. F.(1993).Proc. 1 st Pacific Asia Conf. on Formal and computational Linguistics.
  4. Chen, P. P. S.(1976).The Entity-Relationship Model: Towards a Unified View of Data.ACM Transactions on Database Systems,1(1),9-36.
  5. Sanders, G. L.(1995).Data Modeling.
  6. Tse, K. P. J.(1986).Standardization of Chinese in Taiwan.International Journal of the Sociology of Language,59,25-32.
  7. Tsou, B. K.(1996).2nd Conference of Language Modernization.
  8. Tsou, B. K.(1983).Language Atlas of the Pacific Region.
  9. Tsou, B. K.(1995).Symposium on Prisma Sprache: Chineseche Versuche zur Bewaltingung Westilchen Gedankenguts.
  10. Tsou, B. K.(1996).3rd Annual Conference of the Y.R. Chao Centre for Chinese Linguistics.
  11. Tsou, B. K.(1993).Language , Law and Equality: Proceedings of the 3rd International Conference of the International academy of Language Law (IALL).
  12. Tsou, B. K.(1995).1st National Conference on Language and Writing Applications, National Commission on Language Reform.
  13. Tsou, B. K.(1975).On the Linguistic Covariants of Cultural Assimilation.Anthropolical Linguistics,17(9),445-465.
  14. Tsou, B. K.(1989).香港和中國大陸的一些語言現像=Some Aspects of Language in Hong Kong and China.Chinese Language Bulletin, Chinese University of Hong Kong,4,3-9.
  15. Tsou, B. K.,Liu, K. F.,Wong, P. K.,Sze, M.,Lun, C. S.(1990).Proceedings the World Conference on Chinese Language Teaching.
  16. Tsou, B. K.,Sze, M.,Liu, F.,Wpong, P. K.,Lun, C. S.(1990).香港電視新聞節目中的粵語與普通話用語初探=Cantonese and Putonghau in HOng Kong TV News Programmers: A Preliminary Study in Language Use.2nd Conf. on Cantonese and Other Yue Dialects,12(1),70-76.
  17. 中文詞堅研究小組委員會(1986)。香港初中學生中文詞匯研究
  18. 北中央研究院資訊科學研究所中文詞知識庫小組CKIP(1993)。新聞語料庫字頻統計表
  19. 北京語言學院語言教學研究所(1990)。現代漢語常用詞詞頻詞典
  20. 劉源(1990)。現代漢語常用詞詞頻詞典
Times Cited
  1. 謝易達(2000)。共產國際影響下中共發展歷程之研究--1921--1943。臺灣師範大學三民主義研究所學位論文。2000。1-280。
  2. 陳益宏(2010)。應用語音驅動技術之多媒體遠距互動系統。成功大學電機工程學系學位論文。2010。1-51。