漢語動詞語意特指之量度:語料庫為本的計量研究

本研究旨在探討漢語動詞語意特指之量度表現。為了使中文動詞的語意內容量表化，我們首先針對一百五十個個基本動詞做人為判定標記，分為廣泛語意動詞(Generic verb) 和明確語意動詞(Specific verb) 兩種類型。藉由文獻中多種探討語意組成成分的論點，提出三項判斷標準:對施事者、施事工具的隱射，對受事類型的規範，以及語意轉化的表現。為使類型判斷標準化，本文採用語料庫語言學中所著重的量化表現包括字詞頻率、語意數量、以及受詞數量作為動詞類型的變數，再以統計學中的主成份分析(Principle Component Analysis) 判定變數的影響權重，以及多項羅吉特模型(Multinomial Logistic Model, MNLM) 為動詞類型作區別。此外，本文利用中央研究院平衡語料庫(Academia Sinica Balanced Corpus)，建立一個詞彙分佈模型　(Distributional Model)，並且利用潛在語意分析法(Latent Semantic Analysis)，將動詞語意轉化為高維度向量。在以向量構成的模型中，每一個字詞在語料中的分佈，轉化為點在高維空間分佈。透過距離測量(Distance Measure) 的方式以及集群分析法(Cluster Analysis)，探討詞與詞之間的相似性，以及動詞語意和詞彙間潛在的語意關連性。本研究更進一步解釋，不同的動詞類型字間差距，以及中文結果複合動詞(Chinese Resultative Verb Compound) 之語意相關性。

關鍵字

語意特指；漢語動詞；計量研究；潛在語意分析

並列摘要

The purpose of this thesis is to study semantic specificity in Chinese based on corpus-based statistical and computational methods. The analysis begins with single verbs and does primitive tests with resultative verb compounds in Chinese. The verbs studied in this work include one hundred and fifty head verbs collected in the M3 project. As a prerequisite, these one hundred and fifty head verbs were tagged as generic or specific type following the three criteria proposed in literatures: the specification of agent/instrument, the limitation of objects and their types, and the confinement on the action denotation to only physical action. The next step is to measure semantic specificity with quantitative data. To specify the use of verbs by statistics, it relies on counting the frequency, the number of senses of a verb and the range of co-occurrence objects. Two major analyses, Principle Component Analysis (PCA) and Multinomial Logistic Model, are adopted to assess the predictive power of variables and to predict the probability of different verb categories. In addition, the vector-based model in Latent Semantic Analysis (LSA) is applied to justify the concept of semantic specificity. A distributional model based on Academia Sinica Balanced Corpus (ASBC) with LSA is built to investigate the semantic space variation depending on the semantic specificity. By measuring the vector distance, the semantic similarity between words is calculated. The word-space model is used to measure the semantic loads of single verbs and explore the semantic information on Chinese resultative verb compounds (RVCs).

並列關鍵字

semantic specificity ； verbs in Mandarin ； quantitative study ； latent semantic analysis

參考文獻

Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using r. Cambridge University Press.

Breedin, S. D., Saffran, E. M., & Schwartz, M. F. (1998). Semantic factors in verb retrieval: An effect of complexity. Brain and Language, 63, 1-31.

Chen, J. (2007). He cut-break the rope: Encoding and categorizing cutting and breaking events in mandarin. Cognitive Linguistics, 18(2), 273–285.

Gelman, S. A., & Tardif, T. (1998). A cross-linguistic comparison of generic noun phrases in english and mandarin. Cognition, 66(3), 215-248.

Gentner, D. (1978). On relational meaning: the acquisition of verb meaning. Child Development, 49, 988-998.

國際替代計量

漢語動詞語意特指之量度:語料庫為本的計量研究

主題瀏覽