  • 學位論文


A Thesaurus-Based Semantic Classification of English Collocations

指導教授 : 張俊盛


本論文提出一個依照語意概念自動標示與分類搭配詞的新方法,旨在研究搭配詞語意概念索引對第二外語學習者之搭配詞能力及其與搭配詞學習工具之關係,同時並檢驗建立搭配詞概念索引架構應用於電腦輔助搭配詞學習工具的實質效益,以建立英語分類搭配詞詞典。搭配詞的研究逐漸受到重視,研究重心多放在字詞搭配教學對英文能力的提昇,認為搭配詞教學可以取代傳統二分法的字彙與文法教學,搭配詞中,動詞名詞和形容詞名詞的搭配是第二外語學習者最難掌握的,利用電腦輔助語言教學工具可提升第二外語學習者的學習效率。我們的方法首先會在訓練階段利用隨機漫步(Random Walk)演算法與一部依照概念索引的英語詞典,選出每一個字在特定概念下最相關的WordNet語意。接著,再利用WordNet中蘊涵的語義關係,針對每一個概念透過語意連結,擴充特定概念下最相關的詞彙。最後,我們從學習字典中,挑選出最難掌握其搭配詞用法的字為關鍵字,針對每一個動詞名詞和形容詞名詞搭配詞,再利用隨機漫步(Random Walk)演算法,自動為關鍵字下的搭配詞標示上最相關的語意概念,然後依照概念歸納原本零散的搭配詞條,為現有的電腦輔助搭配詞搜尋工具,建立有系統的概念層次檢索架構。我們實際撰寫程式,建立自動搭配詞語意標示分類系統,以859筆JustTheWord語言學習工具之搭配詞條為測試資料,與我們依概念分類系統產生的搭配詞群,進行效能比較。實驗的結果我們獲得近80%的準確率以及70%的召回率。實驗顯示我們的搭配詞群不但勝過JustTheWord的概念分群,本研究的自動分群結果更接近人類判斷,也說明用我們的方法進行語意標示及分類,的確可以幫助第二外語學習者的搭配詞學習效率及改善現有搭配詞學習工具的檢索效能。


New computational tools for extracting collocations are a great boon to both language learners and lexicographers alike. A new method is proposed in this paper to organize the extremely numerous collocates that these tools can return into semantic thesaurus categories. The approach introduces a thesaurus-based semantic classification model automatically learning semantic relations for classifying adjective-noun (A-N) and verb-noun (V-N) collocations into different categories. As it is most relevant to language learners, the research focuses on the frequent patterns of collocation errors, A-N and V-N collocation pairs. Our model uses a random walk over vertices and edges on a weighted graph derived from WordNet semantic relations. We compute a semantic label stationary distribution via an iterative graphical algorithm. Semantic label of a collocate is scored by a novel divergence measure that imposes a thesaurus structure on collocation reference tools. In our experiment the resulting semantic relatedness is the WordNet-based measure, most highly correlated with human similarity judgments. The evaluation is conducted on a set of collocations whose collocates involve varying level of abstractness in the collocation usage box of Macmillan English Dictionary. We present our experimental evaluation with a collection of 150 multiple-choice questions commonly used as a similarity benchmark in TOEFL synonym test. The experimental results show that a thesaurus structure is successfully imposed to help enhance collocation production for L2 learners and significantly outperform existing collocation reference tools. The resulting semantic classification establishes close consistency among human judgments as fairly refined examples for evaluation of the model. The methodology neatly improves the performance of collocation reference tools and imposes semantic structure to collocations, which is a good starting point for a much improved and useful presentation of collocations and has been lived up to have positive consequences on robustness for semantic classification for collocations, an attractive feature for organizing broad-coverage machine-readable data to be merged together for catalogued usages of natural language processing.


Benson, M. 1985. Collocations and Idioms. In R. Ilson (Ed.), Dictionaries, Lexicography and Language Learning (ELT Documents 120; Oxford: Pergamon), pp.61-8.
Béjoint, H. 1994. Tradition and Innovation in Modern English Dictionaries. Oxford: Clarendon Press.
Downing, SM, Baranowski, RA, Grosso, and LJ, Norcini, JJ. Item type and cognitive ability measured: the validity evidence for multiple true-false items in medical specialty certification. Appl Meas Educ 1995; 8:189-199.
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
Hall, G. 1994. Review of The Lexical Approach: The State of ELT and a Way Forward, by Michael Lewis. ELT Journal 44, 48.
