透過您的圖書館登入
IP:3.14.135.125
  • 學位論文

基於生成對抗網路與離散小波轉換之彩色文件影像二值化

Binarization of Color Document Image Based on Adversarial Generative Network and Discrete Wavelet Transform

指導教授 : 江正雄

摘要


近年來,影像辨識的技術日新月異,許多網路架構也不斷的推陳出新,透過機器學習可以訓練文件圖像增強、偵測、OCR等方面,將應用於不同面向的問題。 本論文專注於研究文件二值化,文件二值化是一個研究歷史悠久的議題,一般應用於歷史文件之數位典藏,由於歷史文件因年代久遠而造成各種類型的退化。為了提升文件影像後續分析的成效,將文件影像中的前景文字資訊從背景分割出來是一項重要的任務。基於生成對抗神經網絡的兩階段彩色文件圖像增強與二值化方法,訓練流程分成兩個階段的GAN架構,此兩個GAN架構皆有相同的網路架構。每個GAN架構包含生成器(Generator)與鑑別器(Discriminator),而此論文將生成器(Generator)的部分採用Encoder-Decoder架構,Encoder-Decoder主要使用U-net++架構進行生成圖片, U-net++在圖像分割方面提供了良好的性能。 U-Net++在 U-Net原有的編碼器與解碼器的聯繫上,增加連結使編碼器的某幾個輸出與對面的解碼器進行連接。生成器(Generator)中的編碼器則是用了 EfficientNet-B6架構。鑑別器(Discriminator)的部分採用了一個架構類來自Pix2Pix GAN的鑑別器PatchGAN。 實驗結果則是使用Document Image Binarization Competition (DIBCO)的公用樣本做數據的結果,使用此比賽所提供的數據與其他的方法做比較。本研究所提出的演算法進行會進行消融實驗,選定比較好的結果當作主要使用的訓練模式。除了做數據比較,本研究也與其他各種SOTA演算法之數據比較其結果。由本論文中可得知本研究所提出的文件影像系統之數據,與其他SOTA演算法比較,具有良好的效果。

並列摘要


In recent years, the technology of image recognition has been changing with each passing day, and many network architectures have also been continuously introduced. Machine learning can be used to train document image enhancement, detection, OCR, etc., which will be applied to different aspects of the problem. This paper focuses on the study of document binarization. Document binarization is a topic with a long history of research. It is generally applied to the digital collection of historical documents. Due to the age of historical documents, various types of degradation are caused. In order to improve the effectiveness of subsequent analysis of document images, it is an important task to separate the foreground text information in the document images from the background. Based on the two-stage color document image enhancement and binarization method based on the generative confrontational neural network, the training process is divided into two stages of GAN architectures, both of which have the same network architecture. Each GAN architecture includes a Generator (Generator) and a Discriminator (Discriminator), and this paper uses the Encoder-Decoder architecture for the Generator (Generator), and the Encoder-Decoder mainly uses the U-net++ architecture to generate pictures. It provides good performance in image segmentation. U-Net++ adds connections to U-Net's original Encoder-Decoder connection so that certain outputs of the encoder are connected to the opposite decoder. The encoder in the Generator uses the EfficientNet-B6 architecture. The Discriminator part uses an architecture class Discriminator PatchGAN from Pix2Pix GAN. The experimental results are the results of using the public samples of the Document Image Binarization Competition (DIBCO) as data, and using the data provided by this competition to compare with other methods. The algorithm proposed in this study will conduct ablation experiments, and select better results as the main training mode. In addition to data comparison, this study also compares its results with data from various other SOTA algorithms. It can be seen from this paper that the data of the document imaging system proposed in this research has a good effect compared with other SOTA algorithms.

參考文獻


[1] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Communications of the ACM 60(6), 84–90 (2017)
[2] Calvo-Zaragoza, J., Gallego, A.J.: A selectional auto-encoder approach for document image binarization. Pattern Recognition 86, 37–47 (2019)
[3] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015)
[4] Peng, X., Cao, H., Subramanian, K., Prasad, R., Natarajan, P.: Exploiting stroke orientation for crf based binarization of historical documents. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1034–1038. IEEE (2013)
[5] Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 1, pp. 99–104. IEEE (2017)

延伸閱讀