透過您的圖書館登入
IP:18.218.218.230
  • 學位論文

以深度學習為架構之文件影像校正

Deep Learning Based Document Image Rectification

指導教授 : 林慧珍

摘要


This paper proposes a method of document image rectification that can make full use of UNet's skip connection capabilities, and uses a realistic 3D synthetic dataset as a training set. The network architecture proposed in this paper consists of two connected UNets, which sequentially predict the 3D coordinate map and forward map of the input distorted image, respectively. In addition, before inputting the distorted image, its page mask will be predicted in advance, and will be input into the two UNets to help them to focus more on content learning. This mechanism of separating the two tasks of page mask prediction and content prediction can indeed improve the performance of the two UNets. In order to make UNet's skip connection transmit more appropriate features, we make relative adjustments to the number of convolution blocks in the first UNet. The experimental results show that the method proposed in this paper has achieved significant improvements in both MS-SSIM and LD metrics compared with recent studies.

並列摘要


This paper proposes a method of document image rectification that can make full use of UNet's skip connection capabilities, and uses a realistic 3D synthetic dataset as a training set. The network architecture proposed in this paper consists of two connected UNets, which sequentially predict the 3D coordinate map and forward map of the input distorted image, respectively. In addition, before inputting the distorted image, its page mask will be predicted in advance, and will be input into the two UNets to help them to focus more on content learning. This mechanism of separating the two tasks of page mask prediction and content prediction can indeed improve the performance of the two UNets. In order to make UNet's skip connection transmit more appropriate features, we make relative adjustments to the number of convolution blocks in the first UNet. The experimental results show that the method proposed in this paper has achieved significant improvements in both MS-SSIM and LD metrics compared with recent studies.

參考文獻


[1] S. Das, K. Ma, Z. Shu, D. Samaras, and R. Shilkrot, “Dewarpnet: Single-image document unwarping with stacked 3d and 2d regression networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 131–140, 2019.
[2] Y. C. Tsoi and M. S. Brown, “Multi-view document rectification using boundary,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007.
[3] S. You, Y. Matsushita, S. Sinha, Y. Bou, and K. Ikeuchi, “Multiview Rectification of Folded Documents,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, No. 2, pp. 505-511, 2017.
[4] L. Mischke and W. Luther, “Document image de-warping based on detection of distorted text lines,” in Proceedings of International Conference on Image Analysis and Processing, Springer, pp. 1068–1075, 2005.
[5] C. Wu and G. Agam, “Document image de-warping for text/graphics recognition,” in Proceedings of Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition, Springer, pp. 348–357, 2002.

延伸閱讀