  • 期刊
  • OpenAccess

The Formosan Language Archive: Linguistic Analysis and Language Processing


In this paper, we deal with the linguistic analysis approach adopted in the Formosan Language Corpora, one of the three main information databases included in the Formosan Language Archive, and the language processing programs that have been built upon it. We first discuss problems related to the transcription of different language corpora. We then deal with annotation rules and standards. We go on to explain the linguistic identification of clauses, sentences and paragraphs, and the computer programs used to obtain an alignment of words, glosses and sentences in Chinese and English. We finally show how we try to cope with analytic inconsistencies through programming. This paper is a complement to Zeitoun et al. [2003] in which we provided an overview of the whole architecture of the Formosan Language Archive.


Adelaar, K.A.,E. Zeitoun,P. J.-K. Li(1999).Selected Papers from the Eighth International Conference of Austronesian Linguistics.Taipei:Academia Sinica.
Bird, S.,G. Simons(2003).Language.
Blust, R.(2003).Thao dictionary.Taipei:Academia Sinica.
Blust, R..The Austronesian Languages.
Bow, C.,B. Hughes,S. Bird(2004).Proceeding of EMELD 2004: the Workshop on Linguistic databases and best practice.Detroit, Michigan:


Ching, C. B. (2015). 台灣閩南語「遘」的語法化和詞彙化 [master's thesis, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2015.00330
Huang, W. C. (2012). 佳興排灣語動詞構詞研究 [master's thesis, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2012.00672
Chen, Y. L. (2011). 宜蘭縣泰雅語音韻研究 [doctoral dissertation, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2011.00702
Chou, Y. M. M. (2008). 賽夏語疑問詞問句及其左緣結構之研究 [master's thesis, National Tsing Hua University]. Airiti Library. https://doi.org/10.6843/NTHU.2008.00130
