透過您的圖書館登入
IP:18.219.112.111
  • 期刊

Corpus Escrito de Aprendices Taiwaneses de Español I: USO y Aplicación

「台灣西語學習者書面語語料庫I」之操作與應用

摘要


基於語料庫語言學盛行的潮流所趨與第三語應用語言學研究的需求,我們從2005年著手建立「台灣西語學習者書面語語料庫」。語料是以西文為第三語的台灣學習者的作文為收集來源。七年來共收集了2,425篇作文、約44萬字的學習者語料,在合作團隊支援下,秉持著學術資源交流與共享的宗旨和原則,我們先行公開第一階段(2005至2007年)所收集的語料,提供語料庫的檢索查詢服務。(http://corpora.flld.ncku.edu.tw)。相較於一般未經註記處理的生語料庫,本語料庫的特色在於具有詞類與字根,以及錯誤修正的標誌註記,故其檢索結果與利用價值較高。未來,除擴展語料數量與類型外,並朝提昇標註技術與強化檢索功能的目標努力,以提供西語界更便利有效的語料庫檢索資源。

並列摘要


The learner corpus of English has been developed and researched throughout the world. However, the learner corpus of Spanish is still not completely developed. Because of the popular trend of corpus linguistics and the emerging demand for researching a third language in applied linguistics, we began constructing the Taiwanese Learners' Written Corpus of Spanish (CEATE, Corpus Escrito de Aprendices Taiwaneses de Español) in 2005. In the past 7 years, we collected 2,425 compositions comprising approximately 440,000 words, written by Taiwanese learners whose L1 is Mandarin Chinese and who learn Spanish as a third language and English as a second. With teamwork and technical support of computational linguistics, our construction of CEATE has mainly been dedicated to sharing academic resources. We are currently releasing data collected during the first 3 years (Phase I) for public use (http://corpora.fl1d.ncku.edu.tw). In contrast to a raw corpus, ours is POS-tagged and correction-annotated. Therefore, by accessing this corpus, researchers, teachers and learners can conduct an efficient, practical, and systematic search directly related to their particular interests. In the near future, we will not only increase the number and types of data, but also improve the techniques of annotation and function of searching to offer a useful resource for study of Spanish language learning.

參考文獻


Lu, H. -C.,Lin,L. -T.,Pai, F. -I.(2007).Corpus-based study of foreign language teaching: Analysis of collocation.Foreign Language Studies.6,39-58.
Lu, H. -C.(2006).Estudio de colocaciones a partir de corpus.Languages, Literary Studies and International Studies: An International Journal.3,17-30.
Lu, H. -C.,Wang, Y. -C.(2006).Parallel corpus-based study of collocations.Tamkang Studies of Foreign Languages and Literatures.8,159-176.
Lu, H. -C.,Lu, L. -S.(2009).Parallel corpus-based study of conjunctions.Computational Linguistics and Chinese Language Processing.14(4),403-422.
CEDEL2 (Corpus Escrito del Español L2). Dirigido por A. Mendikoetxea, Universidad Autónoma de Madrid, España. http://www.uam.es/proyectosinv/woslac//cedel2.htm

延伸閱讀