本論文提出一個介系詞改錯方法,可以在不依賴人工標注資料的情況下改正句子中潛在的介系詞錯誤。在我們的方法中,我們在可能遺漏介係詞的位置插入佔位符,並嘗試使用遮罩語言模型來替換或刪除句子中的介系詞和占位符來改正潛在的介系詞錯誤。我們的方法是將母語語料庫中的句子轉換為帶有遮罩的句子和被遮住的介係詞(或符號“[NONE]”)組成的訓練資料,用來表示遺漏、錯誤和多餘的介係詞錯誤,並用合成的資料來訓練遮罩語言模型,使之有能力改正介係詞錯誤。 那些訓練資料是透過遮蓋現有介係詞或在容易出現多餘介係詞的位置插入表示遮蓋的符號來建立的,此外,我們使用 BEA-2019 和 CONLL-2014 的資料集進行評估。初步結果顯示,我們的方法跟前人的研究成果比起來有較好表現。
We introduce a method AccuPrep for correcting preposition errors in a given sentence without using annotated training data. In our approach, we insert placeholders for potential missing prepositions and then attempt to replace or delete prepositions and placeholders with a masked language model (MLM). The method involves converting sentences in a given reference corpus into a dataset of pairs of masked sentence and filler prepositions (or the “[NONE]” symbol) to represent missing, wrong, and unnecessary preposition errors, training a MLM for correcting preposition errors. These masks are created either by replacing existing prepositions or by inserting in potential positions of unnecessary prepositions. We present a prototype based on the proposed method and test on the BEA-2019 shared task and the CONLL-2014 shared task. Preliminary evaluation shows that our approach outperforms previous work.