透過您的圖書館登入
IP:3.137.185.180
  • 學位論文

中文的常用詞串

Lexical Bundles in Chinese

指導教授 : 謝舒凱
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文旨在抽取中文口語與新聞裡的常用詞串(lexical bundle),並分析其在篇章中的使用。本研究為中文常用詞串的語言結構建立一套分類架構,也仔細審視這些詞串的功能。 本研究從中央研究院現代漢語平衡語料庫第四版中,抽取三字、四字常用詞串。一開始,先自動抽取出在每百萬詞中出現至少二十次、並出現於至少五個檔案的詞串。這種幾乎是純以頻率為本的取向有些研究方法上的議題仍待解決,因而須加採更為敏感的離散指標(dispersion measure)、詞彙搭配力指標(word association measure),所得結果也需要再經過人工分析。 探索性資料分析(data exploratory analysis)顯示,口語對話中出現的詞串類型較新聞多;此外,關於詞串在語料庫中所佔的比例,口語對話也較新聞高。同樣的傾向早已在英文中觀察到,這些發現意味著,在自然口語中,說話者面對即時的壓力,因此更依賴像是詞串這類預製語塊(prefabricated chunk)。 本研究接著深入探討中文對話裡常用詞串的使用,為先前英文裡的發現提供跨語言的支持。第一,中文對話裡大部分的詞串在結構上並不完整,且跨越傳統的語法結構,但我們仍可根據這些詞串的結構特徵為其做系統化的分類。第二,這些詞串在篇章中具有三大主要的功能類型,可以促進人際溝通,例如表達立場,亦可組織篇章,例如引介話題,還有各種指涉的用法。第三,常用詞串的結構與功能之間存有明顯的關聯性:用於表達立場的詞串大多以子句、動詞片語的形式出現,而用於指涉的詞串則大多以名詞短語的形式出現。另一方面,中文與英文有一項顯著的差異,即中文裡名詞詞串的數量相當多,這點可歸因於中文特殊的語言結構特徵。 此外,本研究亦深入探討中文新聞裡常用詞串的使用。結果發現,新聞寫作的傳統與原則,例如貼近事實、避免模糊不清、使新聞事件與讀者產生關聯、精簡等,會影響新聞中詞串的分布。例如,相對於口語對話來說,用於表達不精準、立場不確定的詞串在新聞中顯然較少出現,而有些用於強調新聞價值的篇章組織詞串則時常在新聞中出現。儘管有這些差異,用於處理對話詞串的分類架構仍適用於新聞裡的詞串,常用詞串結構與功能之間的關聯性亦存在於新聞語體中。 我們期望本論文對中文常用詞串的研究成果,能夠從以語言使用為本(usage-based)的觀點,闡明多字組合(multi-word unit)何以浮現(emerge)出來,並說明語言結構與不同語體溝通需求之間複雜的關係。本研究所抽取出的詞串可用於擴增中文的語言資源,亦可作為語言教師、學生重要的參考資料、以及心理語言學實驗的素材。

並列摘要


The present dissertation aims to identify lexical bundles (i.e., recurrent word sequences) in Chinese conversation and news and investigate their use in discourse. A structural taxonomy is created for lexical bundles in Chinese, and their functions are also closely examined. In the present study, three-word and four-word lexical bundles are identified from the Academia Sinica Balanced Corpus of Mandarin Chinese (the fourth edition). An initial list includes word sequences occurring at least twenty times per million words and in at least five corpus texts. To deal with vital methodological issues concerning this almost purely frequency-based approach, another more sensitive dispersion measure, a word association measure, and a manual analysis are needed. A data exploratory analysis of lexical bundles in Chinese shows that conversations feature a much wider range of lexical bundles than newswire texts; as for the proportion of corpus data covered by lexical bundles, conversation is also higher than news. The same tendency has been observed in English, suggesting that in spontaneous speech, speakers are under real-time pressure and thus rely more heavily on prefabricated chunks such as lexical bundles. A comprehensive investigation of lexical bundles in Chinese conversation provides more cross-linguistic support for previous findings in English. First, most lexical bundles in Chinese conversation are not structurally complete and run across traditional grammatical structures, but they can be systematically categorized according to their structural characteristics. Second, these bundles serve important functions in discourse, facilitating interpersonal communication (e.g., expressing stances), organizing discourse (e.g., introducing a topic), and having a variety of referential uses. Third, there is a strong relationship between the structure and the function of lexical bundles: stance bundles are closely associated with clausal and VP-based categories, whereas referential expressions are closely associated with NP-based categories. On the other hand, a striking difference between Chinese and English is that NP-based bundles are much more dominant in Chinese, and this is attributed to structural characteristics specific to Chinese. Furthermore, a detailed examination of lexical bundles in Chinese news suggests that conventions and principles in journalistic writing (e.g., sticking to facts, avoiding ambiguities, relating news event to readers, using shorter forms) influence the distribution of news bundles. For example, imprecision bundles and personal epistemic bundles that express an uncertain stance occur less frequently in news than in conversation, while some discourse organizers that are used to identify something newsworthy to the reader occur more frequently in news than in conversation. Despite these differences, news bundles fit comfortably in with the classification frameworks of conversation bundles, and the relationship between the structure and the function of lexical bundles is reconfirmed. It is hoped that the findings of the present dissertation on lexical bundles in Chinese can elucidate the emergent nature of multi-word units from a usage-based perspective and illustrate complex interactions between language-specific structural properties and genre-specific communicative needs. The lexical bundles identified in the present study can be used to enrich existing language resources in Chinese, and they may also serve as important references for language teachers/learners and psycholinguistic experiments.

參考文獻


Aijmer, Karin. 2008. “So er I just sort I dunno I think it’s just because…”: A corpus study of I don’t know and dunno in learners’ spoken English. In: Andreas H. Jucker, Daniel Schreier and Marianne Hundt (eds.), Corpora: Pragmatics and Discourse, 151-168. Amsterdam: Rodopi.
Aijmer, Karin and Altenberg, Bengt. 1996. Introduction. In: Karin Aijmer, Bengt Altenberg and Mats Johansson (eds.), Languages in Contrast: Papers from a Symposium on Text-based Cross-linguistic Studies, 11-16. Lund: Lund University Press.
Baroni, Marco and Evert, Stefan. 2008. Statistical methods for corpus exploitation. In: Anke Lüdeling and Merja Kytö (eds.), Corpus Linguistics: An International Handbook, 777-803. Berlin: Mouton de Gruyter.
Biber, Douglas. 2009. A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14(3): 275-311.
Biber, Douglas and Barbieri, Federica. 2007. Lexical bundles in university spoken and written registers. English for Specific Purposes 26(3): 263-286.

延伸閱讀


國際替代計量