以深度學習為架構之文件影像校正

This paper proposes a method of document image rectification that can make full use of UNet's skip connection capabilities, and uses a realistic 3D synthetic dataset as a training set. The network architecture proposed in this paper consists of two connected UNets, which sequentially predict the 3D coordinate map and forward map of the input distorted image, respectively. In addition, before inputting the distorted image, its page mask will be predicted in advance, and will be input into the two UNets to help them to focus more on content learning. This mechanism of separating the two tasks of page mask prediction and content prediction can indeed improve the performance of the two UNets. In order to make UNet's skip connection transmit more appropriate features, we make relative adjustments to the number of convolution blocks in the first UNet. The experimental results show that the method proposed in this paper has achieved significant improvements in both MS-SSIM and LD metrics compared with recent studies.

關鍵字

文件影像校正；深度學習；卷機神經網路； UNet架構； MS-SSIM指標； LD指標

並列摘要

並列關鍵字

Document Image Rectification ； Deep Learning ； Convolutional Neural Network ； UNet ； MS-SSIM ； LD

參考文獻

[1] S. Das, K. Ma, Z. Shu, D. Samaras, and R. Shilkrot, “Dewarpnet: Single-image document unwarping with stacked 3d and 2d regression networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 131–140, 2019.

Google Scholar

[2] Y. C. Tsoi and M. S. Brown, “Multi-view document rectification using boundary,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007.

Google Scholar

[3] S. You, Y. Matsushita, S. Sinha, Y. Bou, and K. Ikeuchi, “Multiview Rectification of Folded Documents,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, No. 2, pp. 505-511, 2017.

Google Scholar

[4] L. Mischke and W. Luther, “Document image de-warping based on detection of distorted text lines,” in Proceedings of International Conference on Image Analysis and Processing, Springer, pp. 1068–1075, 2005.

Google Scholar

[5] C. Wu and G. Agam, “Document image de-warping for text/graphics recognition,” in Proceedings of Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition, Springer, pp. 348–357, 2002.

Google Scholar

主題瀏覽