透過您的圖書館登入
IP:3.137.161.222
  • 學位論文

重述語的自動生成與改錯

Automatic Generation of Phrasal Paraphrases and Corrections

指導教授 : 張俊盛

摘要


本論文提出ㄧ基於單語語料庫抽取同義詞並自動生成片語式重述語的方法。我們利用同義詞替換給定片語中的所有實詞,並使用N元語法來證實重述語的可用性。此方法採用語法規則與詞向量,抽取可能的同義詞並過濾非同義詞。在執行期間,系統動態替換給定片語中的每個實詞來產生重述語候補,並利用多重統計式同義詞測量法、詞向量、N元語法作為重述語排序的依據。我們提出ㄧ重述語系統的原型,Rephrase2.0 (http://ironman.nlpweb.org:13142/),採用網路規模的語料訓練,作為實踐此論文方法的依據。實驗結果證實結合語法規則與詞向量,可以自動生成品質良好之重述語,對於語言參照和第二語言學習有一定的幫助。

並列摘要


We introduce a new method for automatically generating phrasal paraphrases based on synonyms extracted from the monolingual corpus. In our approach, each content word in a given phrase is replaced with synonyms and then validated using Ngrams. The method involves extracting and filtering synonymous relations based on surface patterns and word embedding. At run-time, content words in the given phrase are replaced with synonyms to derive candidate paraphrases, and re-ranking is performed on the candidates based on synonym measures, word embedding, and Ngram statistics. We present a prototype paraphrasing system, Rephraser2.0 available at http://ironman.nlpweb.org:13142/, that applies the method to a Web scale corpus. Our methodology clearly supports combining surface patterns and word embedding for generating paraphrases useful for language reference and second-language learning.

參考文獻


Adam Kilgarriff, Milos Hus ́ak, Katy McAdam, Michael Rundell, and Pavel Rychl`y. Gdex: Automatically finding good dictionary examples in a corpus. In Proc. Euralex, 2008.
Gary F. Simons and Charles D. Fennig. Simons, gary f. and charles d. fennig (eds.). 2017. ethnologue: Languages of the world, twentieth edition. dallas, texas: Sil international. Online version: http://www. ethnologue. com, 2017. Andrew Trask, Phil Michalak, and John Liu. sense2vec-a fast and accurate method for word sense disambiguation in neural word embeddings. arXiv preprint arXiv:1511.06388, 2015.
Thorsten Brants and Alex Franz. Web 1T 5-gram Version 1. Linguistic Data Consortium. Philadelphia: Linguistic Data Consortium, 2006. Chris Callison-Burch, Philipp Koehn, and Miles Osborne. Improved statistical machine translation using paraphrases. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL’06, pages 17–24, Stroudsburg, PA, USA, 2006. Association
Jian-Cheng Wu, Yu-Chia Chang, Teruko Mitamura, and Jason S Chang. Automatic collocation suggestion in academic writing. In Proceedings of the ACL 2010 Conference Short Papers, pages 115–119. Association for Computational Linguistics, 2010.
for Computational Linguistics. doi: 10.3115/1220835.1220838. URL http://dx.doi.org/10.3115/1220835.1220838.

延伸閱讀