透過您的圖書館登入
IP:216.73.216.73
  • 學位論文

基於深度學習之行草中文古文辨識

Cursive Chinese Calligraphy Recognition For Historical Documents—A Deep Learning Approach

指導教授 : 廖文宏

摘要


書法是中國古代重要的書寫工具,亦是一種藝術形式。其中,草書書法在規範與結構上相較其他書體更為自由且能顯露出書法家個性。然而,此一藝術體現使得草書書法的文字更難以被辨識,即便是於人文專家,進行歷史文本數位化作業的仍是一項曠日廢時的工作。然而,光學文字辨識 (OCR)在結構簡化、風格迥異的草書中文字上的效果無法滿足實務需求。因此,協助草書書法辨識的輔助工具需求被出。 在這項研究裡,我們使用基於深度學習的方法進行草書書法辨識的研究。目前並沒有一套公開可被檢視的草書中文字資料集,我們經過網路蒐集並以人力進行資料整理後,彙整了一套包含 5301字、42862張圖片的草書中文字資料集。 由於針對草書書法的相關研究相當稀少,我們將草書辨識延伸思考為手寫中文字辨識的子問題並 探討相關研究 。我們以過去在手寫中文字辨識上表現優異的M6網路架構為基礎,提出加入 Batch Normalization與額外的全連接層的EM6、由DenseNet-121簡化而來的 DenseNet-18,以及考慮中文手寫字特性的三叉網路框架。雖然這幾種架構在訓練階段的準確度相近,但 EM6網路有最高的測試準確度。我們最後選擇使用 EM6模型,以二南堂法帖作為測試資料,在18668張測試圖片的辨識任務上達到64.3%的Top-1準確度及80.5% Top-5準確度。

並列摘要


Calligraphy is one of the most important writing tool as well as cultural art in ancient China. Compared with other calligraphy styles, the cursive script is least restricted and oftentimes exhibits the personality of calligraphers. However, this style-oriented expression makes the cursive script hard to recognize even for trained experts. Furthermore, optical character recognition (OCR) systems are designed for printed texts and perform poorly on cursive scripts. The call for auxiliary tools for cursive Chinese calligraphy text recognition has thus arisen. In this study, we employ the deep learning-based approach to the recognition of cursive Chinese calligraphy. As there are currently no open datasets for cursive Chinese calligraphy, we collected 42862 images of 5301 different Chinese characters written in cursive format to train our neural network. Since there exists little previous research on this topic, we consider the cursive Chinese calligraphy recognition task as a variant of offline handwriting recognition. We proposed and investigated three different neural network architectures, namely, Enhanced M6 (EM6), DenseNet-18, and 3-way neural network. EM6 is constructed by adding batch normalization and an additional fully connected layer to decrease the impact of overfitting; The DenseNet-18 is simplified from DensetNet-121 with shallower network depth. The 3-way neural network is devised based on our observation of Chinese writing. These networks achieved similar performance during the training phase. However, the EM6 outperforms the others in terms of test accuracy and hence becomes our model of choice. We evaluate the proposed EM6 model on 18668 cursive Chinese calligraphy images extracted from BiSouth model calligraphy and achieve 64.3% Top-1 accuracy and 80.5% Top-5 accuracy, respectively.

參考文獻


[1] Ivakhnenko, Alekseĭ Grigorʹevich, and Valentin Grigorévich Lapa. Cybernetic predicting devices. No. TR-EE66-5. PURDUE UNIV LAFAYETTE IND SCHOOL OF ELECTRICAL ENGINEERING, 1966.
[2] ImageNet. http://www.image-net.org/
[3] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/
[4] Liu, Cheng-Lin, et al. "CASIA online and offline Chinese handwriting databases." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011
[5] Huang, Yi-Fan. “Recognition of low resolution text using deep learning approach”. MS Thesis. National Chengchi University, ,

延伸閱讀