This paper proposes a method of document image rectification that can make full use of UNet's skip connection capabilities, and uses a realistic 3D synthetic dataset as a training set. The network architecture proposed in this paper consists of two connected UNets, which sequentially predict the 3D coordinate map and forward map of the input distorted image, respectively. In addition, before inputting the distorted image, its page mask will be predicted in advance, and will be input into the two UNets to help them to focus more on content learning. This mechanism of separating the two tasks of page mask prediction and content prediction can indeed improve the performance of the two UNets. In order to make UNet's skip connection transmit more appropriate features, we make relative adjustments to the number of convolution blocks in the first UNet. The experimental results show that the method proposed in this paper has achieved significant improvements in both MS-SSIM and LD metrics compared with recent studies.
This paper proposes a method of document image rectification that can make full use of UNet's skip connection capabilities, and uses a realistic 3D synthetic dataset as a training set. The network architecture proposed in this paper consists of two connected UNets, which sequentially predict the 3D coordinate map and forward map of the input distorted image, respectively. In addition, before inputting the distorted image, its page mask will be predicted in advance, and will be input into the two UNets to help them to focus more on content learning. This mechanism of separating the two tasks of page mask prediction and content prediction can indeed improve the performance of the two UNets. In order to make UNet's skip connection transmit more appropriate features, we make relative adjustments to the number of convolution blocks in the first UNet. The experimental results show that the method proposed in this paper has achieved significant improvements in both MS-SSIM and LD metrics compared with recent studies.