透過您的圖書館登入
IP:3.146.255.249
  • 學位論文

應用BERT語言模型於同音別字之訂正

Homophone correction using the BERT language model

指導教授 : 魏世杰
本文將於2025/02/21開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


文字是用來紀錄語言的工具,每個字都有它承載的涵義,但錯別字卻可能使得原本想傳達的意思遭到誤解,抑或導致文章閱讀上的麻煩。而隨著科技的普及,現在大多數人透過輸入文字訊息在溝通,雖不用考慮錯字的可能,但別字的問題隨之層出不窮。   隨著預訓練模型的釋出,過往常需倚靠大量運算的自然語言處理領域得以百花齊放,大大降低了各種應用須要從頭訓練的資源門檻。本研究基於BERT預訓練架構,進行微調,建構一個錯別字偵測系統。在錯別字偵測的正確率達到0.878。接續錯別字偵測系統,則基於BERT預訓練模型,建立了一個錯別字訂正系統,含有錯別字的句子訂正正確率達0.747。達成有效識別及訂正中文錯別字的系統。

並列摘要


Text is a tool for recording languages. Every word has its meaning. With the popularity of technologies, most people communicate by entering text messages. However, typos may cause misunderstanding of the original meaning or cause trouble in reading the text. With the advent of the pre-training model, the field of natural language processing has seen significant progresses as each application is spared the initial cost of time-consuming computation from scratch.   This work constructs an effective typo detection system by fine-tuning the BERT model. The accuracy rate of typo detection reached 0.878. Following the typo detection system, based on the BERT pre-training model, a typo correction system was established. The accuracy rate of sentence corrections containing typos was 0.747. Achieved a system for effectively identifying and correcting Chinese typos.

並列關鍵字

BERT Typos Attention Mechanism Deep Learning NLP

參考文獻


[1] Bahdanau, D., Cho, K., Bengio, Y (2015) Neural Machine Translation by Jointly Learning to Align and Translate In: ICLR 2015.
[2] Chang, C H (1995) A New Approach for Automatic Chinese Spelling Correction Proceedings of Natural Language Processing Pacific Rim Symposium’95, Seoul, Korea, pages 278-283.
[3] Chang, T H., Chen, H C., Tseng, Y H., & Zheng, J L (2013) Automatic Detection and Correction for Chinese Misspelled Words Using Phonological and Orthographic Similarities Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing (SIGHAN-7), Nagoya, Japan, pages 97-101.
[4] Chelba,C., Mikolov, T., Schuster, M., Ge, Q (2018) Thorsten Brants, Phillipp Koehn, and Tony Robinson 2013 One billion word benchmark for measuring progress in statistical language modeling arXiv preprint arXiv:1312.3005.
[5] Chiu, H W., Wu, J C., & Chang, J S (2013) Chinese Spelling Checker Based on Statistical Machine Translation Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing (SIGHAN-7), Nagoya, Japan, pages 49-53.

延伸閱讀