應用深度學習於中文文字識別之研究

光學文字辨識（OCR），主要用途是針對既有書面的文件進行文字識別的工作，在電腦視覺的領域中為一重要角色。然而，在傳統的OCR的應用中，主要探討的應用主題大多以掃描的文件為主，使用者使用掃描器對文件影像進行掃描，經由一連串的前處理後，再由OCR辨識出文字。但是在人手一台行動裝置、人人都相機在身的時代，如果能使用手中的相機直接進行文字辨識將是既方便又經濟實惠的方式。然而，由於相機所擷取到字元影像可能會出現歪斜、旋轉、或是雜訊干擾等情況，導致OCR分類上的困難。因此，本論文的研究目標是希望能夠設計出一套適用於相機拍攝繁體中文字圖片的OCR系統。經過一連串的實驗而得出了兩個能夠辨識歪斜中文字的中文文字辨識系統，分別為擴增式CNN與階層式OCR。前者雖然能承受的歪斜角度較小，但辨識率較高，而後者雖然辨識率略低，但是能夠承受的歪斜角度較大。兩個系統在中文字圖片歪斜角度0°~20°時準確度相差不多，都在95%上下；但是，在中文字圖片歪斜角度20°~ 40°時階層式OCR的準確度就會明顯高於擴增式CNN。

關鍵字

文字識別；深度學習；卷積神經網路

並列摘要

Optical character recognition (OCR), which is mainly used to identify existing written documents, plays an important role in the field of Computer Vision. However, in the traditional application of OCR, the main topics discussed are mainly scanner-base. We use scanner to scan the image of the document, after a series of preprocessing, OCR can identify the text. But in an age of mobile devices and personal cameras, it would be both convenient and affordable to use your camera to recognize words directly. However, due to the character images that captured by camera may appear skewed, rotation, or noise jamming, etc., lead to difficulties in OCR classification. Therefore, the research objective of this paper is to design an OCR system that is suitable for traditional Chinese word image by camera-base. After a series of experiments, we conclude two recognition system, can recognize skewed in terms of Chinese character. Augmented CNN and Hierarchical OCR respectively. Although the former can withstand a small skew Angle, the Accuracy rate is high, while the latter can withstand a larger skew Angle despite a slightly lower identification rate. Two systems in the text image skew Angle 0 ° ~ 20 ° phase accuracy, almost all around 95%; However, in the text image skew Angle of 20 ° ~ 40 ° Hierarchical OCR accuracy will be significantly higher than the Augmented CNN.

並列關鍵字

Optical Character Recognition ； Deep Learning ； Convolution Neural Network

參考文獻

[1] C. C. Wu, C. H. Chou, and F. Chang, “A Machine-Learning Approach for Analyzing Document Layout Structures with Two Reading Orders,” Pattern Recognition, vol. 41, no. 10, 2008, pp. 3200-3213.

Google Scholar

[2] C. H. Chou, S. Y. Chu, and F. Chang, “Estimation of Skew Angles for Scanned Documents Based on Piecewise Covering by Parallelograms,” Pattern Recognition, vol. 40, no. 2, 2007, pp. 443-455.

Google Scholar

[3] C. H. Chou, C. C. Lin, Y. H. Liu, and F. Chang, “A Prototype Classification Method and Its Use in A Hybrid Solution for Multiclass Pattern Recognition,” Pattern Recognition, vol. 39, no. 4, 2006, pp. 624-634.

Google Scholar

[4] C. H. Chou, C. Y. Kuo, and F. Chang, “Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers,” 9th International conference on Document Analysis and Recognition ICDAR 2007, vol. 1, 2007, 198-202.

Google Scholar

[5] C. H. Chou, W. H. Lin, and F. Chang, “A Binarization Method with Learning-Built Rules for Document Images Produced by Cameras,” Pattern Recognition, vol. 43, no. 4, 2010, pp. 1518-1530.

Google Scholar

國際替代計量

應用深度學習於中文文字識別之研究

全文下載

主題瀏覽