重述語是指使用不同的文字表達相同的意思。重述在在英語寫作教學及自然語言處理上都有很重要的應用。本論文提出一套統計式網路n連詞為本的檢索同義詞與重述語的方法。此方法首先擷取大規模網路語料庫中的連結詞語法結構,並藉由排名系數、重疊系數、相互資訊等統計指標過濾詞彙找尋同義詞。我們進一步由英文單字的同義詞,運用語言搜尋引擎,擴充到片語式重述語,並透過統計式分類器,進一步篩選重述語。我們標示將近200個英文片語,來訓練分類器。我們進行實驗,使用本論文提出的系統於大規模的網路語料庫中,檢索同義詞及重述語。實驗結果顯示本論文提出的方法,能有效檢索回同義詞及重述語。
A paraphrase is to express the same semantic content using different words. The use of paraphrases has been widely discussed in both the literature of teaching English writing and Natural Language Processing (NLP). In this paper, we introduce a new method for extracting synonyms and paraphrases for a given word or phrase based on Web-scale n-grams. In our approach, we use surface patterns to extract trigram over the Web, and filter out noises with rank ratio, overlap coefficient with Pointwise mutual information (PMI). Furthermore, we derive phrasal paraphrases from refined synonyms. In our experiments, we applied system to find phrase-level paraphrases, and trained a classifier for about 200 phrases. The experimental results show that the method has the potential to generate good paraphrases of a given phrase.