Analysis of Taiwanese Word Usage: Using Three Taiwanese New Testament Bibles as Examples


台語受著日語、華語等ê強勢語言,佮歷史、政治、社會種種因素ê影響,過去一百冬來語詞使用ê變化應該chiâⁿ大。偌濟語詞生出來,偌濟語詞消失去,語詞抑是語音ê變化是m是有某一寡規則、方向,遮ê是阮想beh探討ê問題。阮利用1916年出版ê《新約聖經》(巴克禮譯本)、1972年出版ê《高陳版新約聖經》佮2008年出版ê《現代台語譯本新約聖經》等三phō台語羅馬字新約聖經做探討ê語料,拍字輸入,轉寫做漢羅文本,並且開發程式kā羅馬字佮漢羅兩種書寫型式ê語詞對齊,進行詞頻統計,比對三版本ê語詞,chhē出這三份語料ê共通詞、無koh再使用ê語詞、新出現ê語詞,並且比對無仝腔口ê語詞佮語音有變化ê語詞,thang看出一百冬來台語語詞使用變化ê情形,mā來分析遮ê改變ê 傾向佮原因。研究結果顯示,這三个版本計共有50外萬个語詞(word tokens)、12,140个詞型(word types),其中ê 1,900个(15.7%)詞型kan-na出現tī 1916年ê新約聖經,遮ê語詞,有部分這馬已經罕得使用,2,039个(16.8%)詞型kan-na出現tī2008年ê新約聖經,新出現ê語詞受華語ê影響chiâⁿ明顯;另外,一寡仝款意思ê語詞,ùi白話音往文言音ê方向徙。


語詞變化 新約聖經 台語 語料庫


Influenced by dominant languages such as Japanese and Mandarin and transitions in history, politics, and society in Taiwan in the past one hundred years, one would expect to see rapid lexical changes in Taiwanese during this period. How many new words have been produced? How many words have disappeared? Are there rules or trends for speech change? This project intends to explore the answers to the above questions.We collected three versions of Taiwanese New Testament Bibles: ”The New Testament Sin-iok” (Barclay translation, published in 1916), ”The New Testament of Ko Tan version” (The red-covered Bible, published in 1972) and ”The New Testament Translated in Modern Taiwanese” (published in 2008), which are written in Romanized scripts (Pe̍h-oē-jī, vernacular writing)-as our corpora. We input these texts into a computer, tagged the metadata, then transcribed them into the mixed Han-Romanized script paragraph by paragraph. Next, we developed a program to align the above two scripts word by word, counted the word frequency, comparing words that appeared in the three versions. We looked for common words, words that are no longer in use and words that have recently emerged among these three versions. Then, by matching word pairs (in Romanized script and in mixed Han- Romanized script) which contain the same Romanized script but different mixed Han-Romanized script from the different versions, including words in different accents or with different pronunciations, we were able to more precisely compute the lexical changes in Taiwanese for the past hundred years. Moreover, we analyzed the tendency and rationale for Taiwanese lexical change.The research results show that the three versions have more than 500,000 word tokens and 12,140 word types respectively. Among them, 1,900 (15.7%) word types only appear in the 1916 edition of the New Testament, and these are rarely used nowadays. Another 2,039 (16.8%) word types appear only in the 2008 edition of the New Testament. These ”new words” show apparent influence of Mandarin. In addition, we found some word tokens with similar meanings changing from vernacular pronunciation to literary pronunciation.


lexical change New Testament Bibles Taiwanese corpus


