基於卷積神經網路應用通道式網路剪枝之指拼法辨識系統設計

近年來人機互動的應用越發廣泛，而指拼法辨識則為其中一種不可或缺的影像辨識系統，然而如果要將龐大的深度卷積神經網路模型應用在指拼法辨識系統上，勢必需要更多的運算成本和儲存空間，在辨識上也未能有效率地來處理。為解決模型過於龐大的問題，本論文提出一種新的模型剪枝方法，藉以達到減少模型大小以及程式執行時記憶體的耗用量；在此篇論文所提出的方法中並不需要設定剪除的閾值，也不需要特別的軟體/硬體的加速器。本論文採用廣為人知的VGGNet 來進行研究，主要分為四個步驟，包括影像前處理、訓練原始模型、模型剪枝及微調模型。首先針對每張彩色影像和深度影像利用墊零(zero padding)方式來調整其大小；其次是訓練原始VGGNet模型；再來是透過批標準化層(Batch Normalization Layer)內的參數，以及網路中各層輸出之統計分布，來決定每一層中輸出通道的重要性，進而去做剪枝以達到壓縮模型的效果；最後則是配合剪枝進程針對剪除後的模型進行微調。從實驗結果證實，使用此剪枝方法能夠大幅減少近 95% 的參數量，同時基於留一交叉驗證法(Leave-One-Out Cross Validation)，此指拼法系統亦能達到 87.80% 的平均準確率，相較於其他文獻之方法也有較好的辨識結果。

關鍵字

卷積神經網路；網路剪枝；指拼法辨識

並列摘要

The application of Human-Computer Interaction (HCI) is more widely used in many communication ways recently, and finger spelling is one of the indispensable vision recognition system in HCI. However, if the huge deep convolutional neural networks are applied to the finger spelling recognition system, it will need more computational cost and storage space, and cannot have efficient inference. To solve the problem mentioned above, a novel network pruning method is proposed in this thesis, and it can reduce the model size and run-time memories; In addition, the proposed method do not need to set a threshold for pruning, and a certain special software or hardware accelerators for efficient inference. This thesis adopts the well-known VGGNet for research, and it is composed of four steps, including image preprocessing, original model training, pruning and fine-tuning. First, each RGB images and depth images will be resized by using the method of zero padding. Then, the original VGGNet model is trained. Next, the importance of each channel in each layer will be decided to be pruned by leveraging the parameters in batch normalization layer and observing the statistical distribution of each layer’s output within the whole network. Finally, the pruned model will be fine-tuned followed with new learning strategy. The experimental result shows that the proposed pruning method can prune near 95% parameters, and it can also achieve an accuracy of 87.80% in average basing on the Leave-One-Out Cross Validation, which is better than other approaches.

並列關鍵字

Finger Spelling Recognition ； Convolutional Neural Network ； Network Slimming

參考文獻

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.

Google Scholar

[2] K. Simonyan and A. J. a. p. a. Zisserman, "Very deep convolutional networks for large-scale image recognition," 2014.

Google Scholar

[3] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.

Google Scholar

[4] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.

Google Scholar

[5] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, "Exploiting linear structure within convolutional networks for efficient evaluation," in Advances in neural information processing systems, 2014, pp. 1269-1277.

Google Scholar

國際替代計量

基於卷積神經網路應用通道式網路剪枝之指拼法辨識系統設計

查找全文

主題瀏覽