本論文提出一個中文拼字改錯的方法,自動學習改正一個句子中潛在的拼字錯誤。 我們應用類神經機器翻譯模型(Neural Machine Translation, NMT)於中文拼字改錯,亦即將一句可能有拼字錯誤的句子翻譯為正確的句子。 我們使用從新聞改稿紀錄和人造錯誤資料中提取的對與錯的句對來訓練一個NMT拼字改錯模型。 在訓練階段,我們首先從新聞改稿紀錄抽取與拼字錯誤修改有關的句子。為了擴充訓練資料,我們使用勘誤表(Confusion Set)來生成具有拼字錯誤的句子,接著用這些資料來訓練模型。 實驗結果顯示,改稿紀錄加上人造錯誤資料所訓練的模型有較好的效能。
We present a method for Chinese spelling check that automatically learns to correct a sentence with potential spelling errors. In our approach, a character-based neural machine translation (NMT) model is trained to translate the potentially misspelled sentence into correct one, using right-and-wrong sentence pairs from newspaper edit logs and artificially generated data. The method involves extracting sentences contain edit of spelling correction from edit logs, using commonly confused right-and-wrong word pairs to generate artificial right-and-wrong sentence pairs in order to expand our training data , and training the NMT model. The evaluation on the United Daily News (UDN) Edit Logs and SIGHAN-7 Shared Task shows that adding artificial error data can significantly improve the performance of Chinese spelling check system.