本論文提出ㄧ基於單語語料庫抽取同義詞並自動生成片語式重述語的方法。我們利用同義詞替換給定片語中的所有實詞,並使用N元語法來證實重述語的可用性。此方法採用語法規則與詞向量,抽取可能的同義詞並過濾非同義詞。在執行期間,系統動態替換給定片語中的每個實詞來產生重述語候補,並利用多重統計式同義詞測量法、詞向量、N元語法作為重述語排序的依據。我們提出ㄧ重述語系統的原型,Rephrase2.0 (http://ironman.nlpweb.org:13142/),採用網路規模的語料訓練,作為實踐此論文方法的依據。實驗結果證實結合語法規則與詞向量,可以自動生成品質良好之重述語,對於語言參照和第二語言學習有一定的幫助。
We introduce a new method for automatically generating phrasal paraphrases based on synonyms extracted from the monolingual corpus. In our approach, each content word in a given phrase is replaced with synonyms and then validated using Ngrams. The method involves extracting and filtering synonymous relations based on surface patterns and word embedding. At run-time, content words in the given phrase are replaced with synonyms to derive candidate paraphrases, and re-ranking is performed on the candidates based on synonym measures, word embedding, and Ngram statistics. We present a prototype paraphrasing system, Rephraser2.0 available at http://ironman.nlpweb.org:13142/, that applies the method to a Web scale corpus. Our methodology clearly supports combining surface patterns and word embedding for generating paraphrases useful for language reference and second-language learning.