透過您的圖書館登入
IP:18.116.239.195
  • 期刊
  • OpenAccess

一種多模型融合的中文古籍OCR後處理方法

A Post-OCR Method of Multi-Model Ensemble for Chinese Ancient Scriptures

摘要


本文提出一種多模型融合的OCR後處理方法,採用獨特的版面分析和對齊算法,整合了整頁檢測模型、字識別模型、列識別模型與語言預訓練模型等深度學習模型,實現了超越單一模型的效果。全文錯誤率達到1.64%,僅為單一模型平均錯誤率的23%。在各類常規古籍版式場景中,該方法具有較好的泛用性。

並列摘要


This paper proposes a post-OCR method of multi-model ensemble, which uses a unique layout analysis and alignment algorithms, and integrate different types of deep learning models, such as the full-page character detection model, character recognition model, line recognition model and language pre-training model, and achieves effects beyond a single model. The full-text error rate reaches 1.64%, which is only 23% of the average error rate of a single model. In various conventional ancient book layout scenarios, this method has good generalization.

參考文獻


Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171-4186). Minneapolis, MN: Association for Computational Linguistics. doi:10.18653/v1/N19-1423。
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv, 1907, 11692. doi:10.48550/arXiv.1907.11692。
Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., & Wang, Y. (2020). Joint layout analysis, character detection and recognition for historical document digitization. In 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) (pp. 31-36). Dortmund, Germany: IEEE. doi:10.1109/ICFHR2020.2020.00017。
Nguyen, T.-T. H., Jatowt, A., Nguyen, N.-V., Coustaty, M., & Doucet, A. (2020). Neural machine translation with BERT for post-OCR error detection and correction. In R. Huang, D. Wu, & G. Marchionini (Eds.), Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (pp. 333-336). New York, NY: Association for Computing Machinery. doi:10.1145/3383583.3398605。
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 779-788). Las Vegas, NV: IEEE. doi:10.1109/CVPR.2016.91。

延伸閱讀