透過您的圖書館登入
IP:13.58.121.131
  • 學位論文

利用電腦字型建立卷積神經網絡之中文漢字模型進行手寫與印刷字體辨識

Handwritten and Printed Chinese Character Recognition By Using Computer Font Type Chinese Characters into Convolutional Neural Network

指導教授 : 黃乾綱

摘要


本研究的目的在改善中文漢字的手寫與印刷字體之辨識。利用現有網路上與電腦內建的現存的不同風格的字型資源,取常用的5000及10000字,並搭配影像處理技術,對這些字體做數種變形與前處理來產生所需要的訓練資料。運用機器學習中的卷積神經網路(Convolutional Neural Networks)之技術,訓練出一個同時具有辨識手寫與印刷體漢字的模型。調整與優化模型參數,反覆驗證,並用其他具有代表性之不同測試資料集做實驗評估。如何利用影像處理技術生成有效之訓練資料、以提升辨識模型的正確率,對不同代表性測試集皆可辨識正確,是本研究的核心目標。 本研究的研究成果主要包含: (1) 如何只以現存的電腦字體來訓練可以同時對手寫字體與印刷字體進行辨識的模型。 (2) 針對古典文獻中的印刷字體辨識最優化,改善古典文獻影像上字體模糊與罕見字等辨識問題。 以實際民初京報、磧砂藏佛典和2013CASIA手寫漢字公開測試集等資料進行實驗,結果顯示,本研究所提出的模型與方法可達正確率京報69.9%、佛典89.29%、手寫字集58.24%。與現有之常用OCR辨識軟體做比較,可提升2~3%的正確率。

並列摘要


The main purpose of this paper is to improve Handwritten Chinese Character Recognition and traditional, non-modern Printed Chinese Character Recognition problem. By using the existing different style of Chinese font resources in computer system and online sources, we take most commonly used 5000 and 10000 words, then do several data deformation and preprocessing by image processing skills to produce training data. Combined with the technology of Convolutional Neural Networks in machine learning, we trained a distinguished model which can be used to recognize handwritten and printed Chinese character both. The main goal of this paper is to find the valid training features, optimize parameters and fine tune our model to get a better performance. The results of this paper mainly include: (1) How to train a model which can recognize both the handwritten font and the printed font simultaneously on by existing computer word font. (2) For the printed Chinese character font, we mainly focus on early traditional printed fonts, and improves the recognition problems, such as rare Chinese characters recognition and characters easily damaged or blur in the original text. (3) We conduct our experiments with the Beijing Civil News, the Biansha Tibetan Buddhist Dharma and the 2013 CASIA handwritten Chinese character public test set. The results show that the model and method we proposed in this paper can reach the accuracy of 69.9% on News, 89.29% on Buddhist Dharma, and 58.27% on handwriting testing set. Compared with the existing common OCR recognition software, our model can improve the accuracy about 2~3%. Key Word : HCCR、PCCR、Image Processing、Machine Learning、Convolutional Neural Networks

參考文獻


1. Liu, Y., J. Tai, and J. Liu. An introduction to the 4 million handwriting Chinese character samples library. in Proceedings of the International Conference on Chinese Computing and Orient Language Processing. 1989.
2. Casey, R. and G. Nagy, Recognition of printed Chinese characters. IEEE Transactions on Electronic Computers, 1966(1): p. 91-101.
3. Amin, A., S.-G. Kim, and C. Sammut. Hand-printed Chinese character recognition via machine learning. in Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on. 1997. IEEE.
4. Wang, N. Printed Chinese character recognition based on pixel distribution probability of character image. in Intelligent Information Hiding and Multimedia Signal Processing, 2008. IIHMSP'08 International Conference on. 2008. IEEE.
5. Khawaja, A., et al. Recognition of printed Chinese characters by using Neural Network. in Multitopic Conference, 2006. INMIC'06. IEEE. 2006. IEEE.

延伸閱讀