探討華語為第二語的語詞統計學習

「統計學習」(statistical learning)為人類尋覓、計算訊號單位間的統計資訊，歸納組合規律的能力。中文的文字排版雖有字間空格，但卻缺少明顯的詞邊界訊息，因此讀者在閱讀時會遭遇斷詞挑戰。過去中文斷詞的研究多探討讀者斷詞的結果，而未討論讀者是如何斷詞的。本研究假設以華語為第二語的學習者能透過統計學習機制計算相鄰語言單位間的銜接概率（Transitional Probability, TP），並以此形成斷詞依據。實驗一至實驗五採用修改自Saffran (1997)之派典。實驗一以6個中文音節組成6個中文雙音節詞，形成一個包含3600個音節的連續音節串。音節詞內相鄰兩音節間的TP = .46 - 1，詞間相鄰兩音節的TP = 0 - .29。該銜接概率為斷詞的唯一線索。受試者聆聽材料後，由測驗中選出聽過的組合。實驗一受試者之平均答對率為 .57，顯示受試者能依據銜接概率，找出音節詞界線。實驗二至實驗四以視覺管道呈現相同統計分布的中文字串，三個實驗之統計學習表現雖僅在顯著邊緣（實驗二 .53，實驗三 .53，實驗四 .52)，但合併計算後之平均答對率跨越顯著門檻，顯示受試者能通過視覺統計學習來斷詞。實驗五探討中文母語者的先前經驗是否影響其統計學習表現？結果顯示當新材料與學習經驗之統計資訊不一致時，先前經驗無益於累積新的統計資訊。實驗六至實驗八採用修改自Fiser 與 Aslin (2002)之派典。實驗六以12個抽象圖形，組成包含288個圖形的圖形串。圖形詞內相鄰兩圖間的TP = 1，詞間相鄰兩圖的TP = .33。實驗六顯示受試者能攫取抽象圖形串的組合規律( .77)。實驗七將材料置換為韓文字母，發現受試者能找出韓文字母詞的統計規律( .65)。實驗八的材料為具有較複雜統計資訊的中文字串，結果顯示在有充足處理時間的狀況下，受試者能掌握文字單位間的複雜統計資訊並以此斷詞( .67)。本研究的實驗結果指出，學習者能透過統計學習機制掌握連續中文字間的統計資訊，找到語詞界線，並據以斷詞。本研究亦討論了可能影響統計學習成效的因素，並提出由語詞統計學習觀點出發的華語教學方案。

關鍵字

統計學習；斷詞；華語；語詞學習；銜接概率

並列摘要

Statistical learning is a pattern induction ability which can trace and compute statistical information from the inputs. Previous researches have demonstrated statistical learning with auditory linguistic inputs and visual nonlinguistic inputs, but none used real language visual symbols (letters or characters) as the material. In Chinese text, there is no physical clue between Chinese words indicating word boundaries. Thus, readers of Chinese text encounter word segmentation problems as in listening to a continuous language stream. In this study, we hypothesize that readers utilize statistical information of the adjacent Chinese characters identifying word boundaries. We employed 2 types of statistical learning paradigms to investigate the statistical learning of words of CLS learners. The paradigm of Exp. 1-5 was adopted from Saffran, Newport, Aslin, Tunick, and Barrueco (1997) study. The material was made up of a continuous Chinese syllable string or a non-spaced Chinese character string. The transitional probabilities among adjacent syllables/characters were the only clue for defining word boundaries. Results showed that CSL learners could segment a continuous natural language-like syllable/character string into small units by calculating the statistical information of it. Yet, participants’ well-established statistical knowledge of material units would dilute the learning outcomes of new material which is made up of learners’ acquainted language units. In Exp. 6-8, abstract symbols, Korean letters, Chinese characters were employed into Fiser and Aslin (2002) VSL paradigm. The results suggested that participants could segment units from continuous visual inputs under different paradigm settings, but the efficiency seems to depend on how the material was presented to participants. The results of these experiments demonstrated that readers could extract statistical information of adjacent characters from a non-spaced Chinese text by reading. The possible constraints of visual statistical learning of Chinese words, as well as some teaching insights based on the research results, were also discussed in the article.