透過您的圖書館登入
IP:18.189.13.43
  • 學位論文

基於遮罩語言模型的介系詞改錯

Learning to correct preposition errors based on masked language model

指導教授 : 張俊盛

摘要


本論文提出一個介系詞改錯方法,可以在不依賴人工標注資料的情況下改正句子中潛在的介系詞錯誤。在我們的方法中,我們在可能遺漏介係詞的位置插入佔位符,並嘗試使用遮罩語言模型來替換或刪除句子中的介系詞和占位符來改正潛在的介系詞錯誤。我們的方法是將母語語料庫中的句子轉換為帶有遮罩的句子和被遮住的介係詞(或符號“[NONE]”)組成的訓練資料,用來表示遺漏、錯誤和多餘的介係詞錯誤,並用合成的資料來訓練遮罩語言模型,使之有能力改正介係詞錯誤。 那些訓練資料是透過遮蓋現有介係詞或在容易出現多餘介係詞的位置插入表示遮蓋的符號來建立的,此外,我們使用 BEA-2019 和 CONLL-2014 的資料集進行評估。初步結果顯示,我們的方法跟前人的研究成果比起來有較好表現。

並列摘要


We introduce a method AccuPrep for correcting preposition errors in a given sentence without using annotated training data. In our approach, we insert placeholders for potential missing prepositions and then attempt to replace or delete prepositions and placeholders with a masked language model (MLM). The method involves converting sentences in a given reference corpus into a dataset of pairs of masked sentence and filler prepositions (or the “[NONE]” symbol) to represent missing, wrong, and unnecessary preposition errors, training a MLM for correcting preposition errors. These masks are created either by replacing existing prepositions or by inserting in potential positions of unnecessary prepositions. We present a prototype based on the proposed method and test on the BEA-2019 shared task and the CONLL-2014 shared task. Preliminary evaluation shows that our approach outperforms previous work.

參考文獻


[1] Christopher Bryant and Ted Briscoe. Language Model Based Grammatical Error Correction without Annotated Training Data. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 247–253, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-0529. URL https: //aclanthology.org/W18-0529.
[2] Christopher Bryant, Mariano Felice, and Ted Briscoe. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793–805, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1074. URL https://www.aclweb.org/anthology/P17-1074.
[3] Christopher Bryant, Mariano Felice, Øistein E. Andersen, and Ted Briscoe. The BEA-2019 Shared Task on Grammatical Error Correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52–75, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-4406. URL https: //aclanthology.org/W19-4406.
[4] Yo Joong Choe, Jiyeon Ham, Kyubyong Park, and Yeoil Yoon. A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 213–227, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-4423. URL https://aclanthology.org/W19-4423.
[6] Mariano Felice, Zheng Yuan, Øistein E. Andersen, Helen Yannakoudakis, and Ekaterina Kochmar. Grammatical error correction using hybrid systems and type filtering. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 15–24, Baltimore, Maryland, June 2014. Association for Computational Linguistics. doi: 10.3115/v1/W14-1702. URL https://aclanthology.org/W14-1702.

延伸閱讀