以超參數調校優化卷積神經網路提升手寫數字辨識準確率之研究

自2016年AlphaGo擊敗頂尖職業棋士後，深度學習的技術引起全球大量關注，近年來深度學習的運用於各個領域快速發展，例如：語言分析、影像辨識、文字辨識、故障預測等，其中影像辨識為被運用最廣泛的一項功能。影像辨識的應用包括汽車車牌辨識、手寫數字辨識、產品檢測、醫療診斷等等。近年來手寫辨識被廣泛運用，例如：各類文件的簽署、圖片中的手寫字符轉換成文字等運用，手寫辨識的準確率也需要有所提升。在手寫數字辨識的研究中，MNIST手寫數字圖片數據集常被用於訓練各種圖像處理系統以及深度學習領域的訓練與測試。近幾年的文獻中，有許多學者使用MNIST進行辨識分析，其模型運作時間耗費兩小時至七天的時間，以及辨識準確率大多落在92%至95%之間，少數準確率達到99%以上，但並不穩定，其中原因是學者們未對超參數做進一步的調校。因為MNIST數據集中均為二維單色的數字圖片，屬於複雜性較低的圖片庫，然而其準確率並無提升，本研究目的在於將辨識的常態準確率提升至99%以上，以利後續其他影像辨識準確率的相關研究做為參考。卷積神經網路（Convolutional Neural Networks, CNN）是專門為處理二維圖像而設計的類神經網路類型，本研究透過建構CNN模型對MNIST數據集進行辨識和分類，並評估訓練模型的準確率，以達到提升手寫數字辨識的準確率的目的。MNIST數據集中共有60,000筆訓練資料與10,000筆測試資料，本研究將60,000筆訓練資料投入CNN模型中訓練，透過田口方法調校超參數建構CNN模型，再利用10,000筆測試資料評估模型的準確率，並建立混淆矩陣（Confusion Matrix）觀察模型演算法的結果。本研究分別建構一次與二次卷積運算的網路架構，搭配不同的超參數配置，對MNIST數據集中的圖片進行辨識。研究結果顯示CNN模型於二次卷積計算的網路結構下能夠達到99%以上的辨識準確率，二次卷積計算的網路結構運算的時間大約花費30分鐘，為一次卷積計算網路結構的兩倍，但是與其他學者的研究相較之下，大幅縮短了模型運作的時間。

關鍵字

類神經網路；卷積神經網路；田口方法；超參數調校； MNIST ；手寫數字辨識

並列摘要

Since AlphaGo defeated the Top professional Go players in 2016, the technology of deep learning has attracted global attention. The application of deep learning in various fields has developed rapidly, such as language analysis, image recognition, text recognition, fault prediction, etc. In the research of handwritten digit recognition, MNIST is often used for training various image processing systems and training and testing in the field of deep learning. In the literature in recent years, many scholars use MNIST for identification and analysis. Its model operation time takes 2 hours to 7 days, and the identification accuracy rate mostly falls between 92% and 95%. The purpose of this research is to increase the normal recognition accuracy to more than 99%, so as to facilitate subsequent studies on the accuracy of other image recognition as a reference. Convolutional Neural Networks (CNN) is a type of neural network designed for processing two-dimensional images. This research uses the CNN model to identify and classify MNIST, and evaluate the accuracy of the training model. This research puts 60,000 data into the CNN model for training, and uses Taguchi method with hyperparameters tuning to construct a CNN model, then uses 10,000 data to evaluate the accuracy of the model, and builds a confusion matrix to observe the results of the model algorithm. In this study, a network architecture of one and two convolution operations was constructed, with different hyperparameter configurations to identify MNIST. The research results show that the CNN model can achieve a recognition accuracy of more than 99%. The calculation time of the network structure of the secondary convolution calculation takes about 30 minutes, which is a convolution calculation network. The structure is twice as large, but compared with other scholars' research, the model operation time is greatly shortened.

並列關鍵字

Neural network ； Convolutional neural network ； MNIST ； Taguchi method ； Hyperparameter tuning ； Handwritten digit recognition

參考文獻

英文文獻

Google Scholar

1. Agboola, O., Ikubanni, P., Adeleke, A., Adediran, A., Adesina, O., Aliyu, S., Olabamiji, T. (2020). Optimization of heat treatment parameters of medium carbon steel quenched in different media using Taguchi method and grey relational analysis. Heliyon, 6(7), e04444.

Google Scholar

2. Aggarwal, A., Mittal, M., Battineni, G. (2021). Generative adversarial network: An overview of theory and applications. International Journal of Information Management Data Insights, 100004.

Google Scholar

3. Ahlawat, S., Choudhary, A. (2020). Hybrid CNN-SVM classifier for handwritten digit recognition. Procedia Computer Science, 167, 2554-2560.

Google Scholar

4. Alvear-Sandoval, R. F., Sancho-Gómez, J. L., Figueiras-Vidal, A. R. (2019). On improving CNNs performance: The case of MNIST. Information Fusion, 52, 106-109.

Google Scholar

國際替代計量

以超參數調校優化卷積神經網路提升手寫數字辨識準確率之研究

未授權

主題瀏覽