透過您的圖書館登入
IP:18.191.235.210
  • 學位論文

利用網路連續詞統計之同義詞 與重述語的自動產生方法

Automatic Generation of Synonyms and Paraphrases based on Web Grams

指導教授 : 張俊盛

摘要


重述語是指使用不同的文字表達相同的意思。重述在在英語寫作教學及自然語言處理上都有很重要的應用。本論文提出一套統計式網路n連詞為本的檢索同義詞與重述語的方法。此方法首先擷取大規模網路語料庫中的連結詞語法結構,並藉由排名系數、重疊系數、相互資訊等統計指標過濾詞彙找尋同義詞。我們進一步由英文單字的同義詞,運用語言搜尋引擎,擴充到片語式重述語,並透過統計式分類器,進一步篩選重述語。我們標示將近200個英文片語,來訓練分類器。我們進行實驗,使用本論文提出的系統於大規模的網路語料庫中,檢索同義詞及重述語。實驗結果顯示本論文提出的方法,能有效檢索回同義詞及重述語。

並列摘要


A paraphrase is to express the same semantic content using different words. The use of paraphrases has been widely discussed in both the literature of teaching English writing and Natural Language Processing (NLP). In this paper, we introduce a new method for extracting synonyms and paraphrases for a given word or phrase based on Web-scale n-grams. In our approach, we use surface patterns to extract trigram over the Web, and filter out noises with rank ratio, overlap coefficient with Pointwise mutual information (PMI). Furthermore, we derive phrasal paraphrases from refined synonyms. In our experiments, we applied system to find phrase-level paraphrases, and trained a classifier for about 200 phrases. The experimental results show that the method has the potential to generate good paraphrases of a given phrase.

並列關鍵字

無資料

參考文獻


Jon Barwise and John Perry. Shifting situations and shaken attitudes. Linguistics and Philosophy, 8(1):105-161, 1985.
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational linguistics, 16(1):22-29, 1990.
Robert De Beaugrande. Introduction to text linguistics, 1981.
Linguistics, 2001.
Gavin Fairbairn and Christopher Winch. Reading, writing and reasoning: a guide for students. McGraw-Hill Education (UK), 2011.

延伸閱讀