透過您的圖書館登入
IP:18.219.22.169
  • 學位論文

利用句法與統計之文法搭配與多字詞語之擷取

Extraction of Multiword Expressions related to Grammatical Collocation Based on Syntactic and Statistical Information

指導教授 : 張俊盛

摘要


本篇論文致力於研究文法搭配(grammatical collocation)的多字詞語,提出一個能從語料中自動地擷取出文法搭配詞的方法。本研究係利用詞性、底層片語(Base Phrase)分析,依據句法選取出符合特定結構的搭配組合,並以統計法測量搭配詞的相關性,篩選出可能構成有意義的文法搭配詞;接著,由已知之多字詞語學習其固定詞性樣式和長度分佈,以此從篩選出的搭配詞中,如(“at ”, “cost”)的文法搭配,進一步找出符合樣式且具有意義的「多字詞語」(multiword expressions),如“at cost”或是“at all costs”。 在利用統計資訊驗證選取的候選者是否為有意義的組合,我們採用相互資訊(mutual information)測量法;計算出每一個搭配候選者的分數後,以此分數的高低代表一個搭配上的關聯性指標,並在實驗中,實際以訓練出的門檻值過濾掉相互資訊值低的組合,篩選出有意義的搭配詞來進行多字詞語的擷取。 本論文可抽取出文法搭配詞的相關多字詞語,且基於字典上的定義,另外延伸一個「介詞-名詞-介詞」的樣式;此研究在英語學習上,可幫助學習者了解介詞搭配實詞的用法,並彌補字典中所沒有的常用文法搭配詞。未來,若能進一步從雙語語料庫擷取多字詞語的翻譯,不但能強化電腦輔助語言學習的效果,並可作為電腦輔助翻譯之用。

並列摘要


This paper concentrates on the study of multiword expressions related to grammatical collocations. We propose a method to automatically extract grammatical collocations from a corpus. Our method involves selecting collocations in line with certain structure based on part of speech information and analyses of base phrases, extracting meaningful grammatical collocations by statistical analysis of associativity. In addition to statistics and linguistic knowledge, we also rely on syntactic patterns of multiword expressions. Take the collocate pattern of (“at”, “cost”) for example. Pattern of seed MWEs will enable us to obtain multiword expressions like “at cost” or “at all costs”. We exploit mutual information (MI) to evaluate each collocation candidate and filter out ones with low mutual information rate, which is a threshold trained on real data. Collocations with MI higher than the lower-bound are further used to assist in the extraction of multiword expressions. The grammatical collocations and related multiword expressions can be used in many Natural Language Processing applications, including computer assisted language learning, parsing, and machine translation.

參考文獻


Thomas C. Chuang, Jia-Yan Jian, Jason S. Chang. 2005. Collocational Translation Memory Extraction Based on Statistical and Linguistic Information. Computational Linguistics and Chinese Language Processing, pp.329-346.
Benson, Morton., Benson, Evelyn., and Ilson, Robert. 1986. THE BBI COMBINATORY DICTIONARY OF ENGLISH: A Guide to Word Combinations. John Benjamins, Amsterdam, Netherlands.
Church, K. W. and Hanks, P. 1990. Word association norms, mutual information and lexicography. Computational Linguistics, 16(1):22--29
Firth, J. R. 1951. “Modes of Meaning” in: Papers in Linguistics 1934-51. (London) 1957 (SS.190-215)
Lü, Yajuan, Zhou, Ming 2004. Collocation Translation Acquisition Using Monolingual Corpora. ACL 2004, pp. 167-174.

被引用紀錄


黃孝慈(2010)。利用字彙與子句結構進行全民英檢閱讀文章難易度分類之研究〔碩士論文,長榮大學〕。華藝線上圖書館。https://doi.org/10.6833/CJCU.2010.00130
呂文瑜(2017)。籃球運動專業化、心流體驗對自我效能之影響─以105學年度大專籃球運動聯賽一般組選手為例〔碩士論文,朝陽科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0078-2712201714442345

延伸閱讀