一種多模型融合的中文古籍OCR後處理方法

This paper proposes a post-OCR method of multi-model ensemble, which uses a unique layout analysis and alignment algorithms, and integrate different types of deep learning models, such as the full-page character detection model, character recognition model, line recognition model and language pre-training model, and achieves effects beyond a single model. The full-text error rate reaches 1.64%, which is only 23% of the average error rate of a single model. In various conventional ancient book layout scenarios, this method has good generalization.

並列關鍵字

post-OCR ； ancient scriptures ； model ensemble ； layout analysis ； deep learning

參考文獻

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171-4186). Minneapolis, MN: Association for Computational Linguistics. doi:10.18653/v1/N19-1423。

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv, 1907, 11692. doi:10.48550/arXiv.1907.11692。

Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., & Wang, Y. (2020). Joint layout analysis, character detection and recognition for historical document digitization. In 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) (pp. 31-36). Dortmund, Germany: IEEE. doi:10.1109/ICFHR2020.2020.00017。

Nguyen, T.-T. H., Jatowt, A., Nguyen, N.-V., Coustaty, M., & Doucet, A. (2020). Neural machine translation with BERT for post-OCR error detection and correction. In R. Huang, D. Wu, & G. Marchionini (Eds.), Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (pp. 333-336). New York, NY: Association for Computing Machinery. doi:10.1145/3383583.3398605。

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 779-788). Las Vegas, NV: IEEE. doi:10.1109/CVPR.2016.91。

國際替代計量

一種多模型融合的中文古籍OCR後處理方法

全文下載

主題瀏覽