透過您的圖書館登入
IP:18.222.107.236
  • 期刊
  • OpenAccess

Stylistic Variation in Mandarin Based on Factor and Correspondence Analyses

華語語體風格差異之因素分析及對應分析研究

摘要


Multi-dimensional analysis (MDA; Biber, 1988) is a predominant approach in corpus and stylistic studies on languages. However, comparatively fewer MDA attempts have been made for Mandarin Chinese and the factors of Taiwan Mandarin had not yet been identified. This study developed a revised tagset specifically for Mandarin and investigated its register variation by adopting two multivariate approaches on a set of selected corpora in 20 genres that comprised about 28 million tokens. First, the factor analysis (FA) identified the seven factors in Mandarin: 1. interpersonal vs. informational; 2. descriptive vs. vocal; 3. elaborative (vs. non-elaborative); 4. explanatory vs. narrative; 5. locative (vs. non-locative); 6. numeric (vs. non-numeric); 7. indicative vs. casual. The rankings of the factor scores from the 20 text types offered an analytical view of the stylistic elements of Mandarin Chinese. An FA-based analytic model was, therefore, induced and constructed, which was able to predict and identify genre types based on the feature (tag) counts in a text. Second, the correspondence analysis (CA) summarily sketched the linguistic diversity in Mandarin in terms of two dimensions: literacy and articulation. The bi-plot charts illustrated the correlated distributions of each genre and part of speech (POS) feature, which exhibited textual and stylistic differences. Four additional texts other than the 20 types present in the included corpora were used to validate the proposed accounts for register variation. It was shown that both FA and CA can capture Mandarin linguistic deviation: FA identifies the finer aspects and the distinctive features, while CA focuses on only two yet critical dimensions with similarity clusters based on frequency data. The seven factors and two dimensions presented in this paper represent the peculiar traits in Mandarin on which further stylistic investigations and cross-linguistic studies could be based.

並列摘要


多維尺度分析(multidimensional analysis)為語料庫及語體風格研究的主流研究方法,然而,將此一研究方法應用於標準華語和台灣華語的嘗試為數不多。本研究針對台灣華語,提出修訂版本之標記集(tagset),並利用二種多維尺度方法,分析20種語體(genre)之語料,共計2千8百萬餘語符(tokens),以探究華語語體風格差異。本研究首先透過因素分析,辨識出七個華語之主成分維度:1.互動交融vs.訊息提供;2.勾劃描寫vs.言談交流;3.詳盡闡述(vs.非詳盡闡述);4.解釋說明vs.敘事詳述;5.地點詳細(vs.非地點詳細);6.數量計算(vs.非數量計算);7.明確指示vs.簡潔隨意。本研究將語料庫內的20種語體依因素分數(factor score)數值大小排序,說明華語中的各類語體變化情況,並提出分析模型,此模型可依文件中的標記頻率預測並判別語體分類。其次透過對應分析,本研究找出二個維度以總結華語之語體變化:用字遣詞及表達方式。對應分析所產生之雙標向量圖可說明語料中詞類及語體種類之相關性分佈。本研究再使用四篇額外的文本來驗證筆者提出的語體變化觀點,並測試因素分析的模型適切度。結果顯示,因素分析及對應分析均能描繪語體變化之情況:前者能辨識較細微之語體因素及特徵,後者基於頻率資訊,辨識相似性集群。本文所建議之七個因素及二個維度乃針對華語語言特徵所提出,後續研究可以此為基礎,進行語體差異研究及跨語言研究。

延伸閱讀