應用卷積神經網路於用印文件辨識之研究

全台各農田水利署管理處（改制前農田水利會）每年度需在「農田水利類公務統計報表與資料輯整合系統」上，填報有關灌溉管理、主計、財務、人事等相關統計資料，最後將經過呈報單位主管核准的「用印」報表掃描為PDF檔後上載到系統，但當填報人員誤載「未用印」報表時，系統並無法辨識及時退回，通常需要等到主管單位發現後再退回，造成延誤填報的問題，本研究的初步構想為，應用人工智慧的方法，讓機器可以自動辨識被上載的報表是否「用印」。近年來應用卷積神經網路（CNN）可自動萃取影像特徵的特性，在許多競賽中均證實，CNN可大幅降低影像辨識的錯誤率，本研究的目的為應用CNN建立報表是否用印的辨識模型，因為辨識模型的架構經常決定模型的優劣，本文共測試4種CNN包含：6層CNN架構、11層CNN架構、VGGNet及Pre-trained VGGNet。模型的優劣評估以Categorical Cross-Entropy Loss函數作為評估標準。訓練的資料共蒐集2,536張「未用印」報表及978張「用印」報表，以機器自動標註後合併作為資料集，再處理為全部正置報表的資料集A，及包含正置／反置／右旋90度／左旋90度報表的資料集B。4種卷積神經網路分別以前述2類資料集進行模型訓練，訓練的方法應用Early Stopping，並假設如果連續5次Loss沒有降低，則停止訓練以防止過度擬合的問題。結果顯示4種CNN均可訓練出99 %以上準確率的辨識模型，其中以Pre-trained VGGNet模型最佳，可以訓練99.9 %準確率的辨識模型。

關鍵字

卷積神經網路；用印文件辨識；人工智慧

並列摘要

The Management Offices of Irrigation Agency in Taiwan has to report statistical data on irrigation management, accounting, finance, personnel, etc. to ＂Statistical Report System for Irrigation Associations, SRSIA＂ each year. The approved reports with ＂Stamps＂ have to be scanned as PDF files and upload to SRSIA in the end. However, SRSIA is not able to reject mis-uploaded file without ＂Stamps＂, which brought thoughts of applying artificial intelligence method to generate a model to identify whether the uploaded PDF file is with ＂Stamps＂ or not. In recent years, the application of convolutional neural network (CNN) can automatically extract the characteristics of image features. CNN has been proved in many competitions that it can greatly reduce the errors of image recognition. The purpose of this research is to use CNN to establish an identification model of whether the report is with ＂Stamps＂ or not. Because the architecture of the identification model often determines the pros and cons of the model, this paper tests a total of 4 CNN models including: 6-layer CNN architecture, 11-layer CNN architecture, VGGNet and Pre-trained VGGNet. Categorical Cross-Entropy Loss function is used as the evaluation method to evaluate the model. The training and testing data included a total of 2,536 PDF files without ＂Stamps＂ and 978 files with ＂Stamps＂, which were automatically labeled by the machine and merged together to be a dataset. PDF files in the dataset then modified to be all upright ones as Dataset-A, and to be including upright/reverse/right rotation 90 degrees/left rotation 90 degrees ones as Dataset-B. 4 CNN models were trained and tested on Dataset-A and Dataset-B. The training method applied Early Stopping with a patience of 5 to avoid overfitting. The results indicated that all 4 models can train identification models with an accuracy rate of more than 99 %. Among them, the Pre-trained VGGNet model is the best, the trained identification model can reach an accuracy of 99.9 %.

並列關鍵字

CNN ； Approved PDF ； Artificial intelligence

參考文獻

Fergus, R.,Perona, P.,Zisserman, A.(2003).Object class recognition by unsupervised scale-invariant learning.2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.(2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Google Scholar

Dalal, N.,Triggs, B.(2005).Histograms of oriented gradients for human detection.2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05).(2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05)).

Google Scholar

Lowe, D.G.J.I.j.o.c.v.(2004).Distinctive image features from scale-invariant.International Journal of Computer Vision.60(2),91-110.

Google Scholar

Felzenszwalb, P.F.(2009).Object detection with discriminatively trained part-based models.IEEE Transactions on Pattern Analysis and Machine Intelligence.32(9),1627-1645.