利用資料熵進行神經網路之壓縮

捲積神經網路在各種電腦視覺領域中都獲得了巨大的成功。然而，受限於硬體以及軟體的資源限制，網路壓縮是一項可以使捲積神經網路更小並且加速網路訓練以及推論的速度。我們著重在通道剪枝這一部分，這屬於網路壓縮的一個部分。首先評估捲積層各個通道的重要程度，並且剪枝去掉較不重要的通道。在這篇論文中，我們提出了權重相互資訊這個方法，相較於傳統的L1范數方法以及其他熵相關的方法有更好的表現。我們首先計算特徵圖譜以及標籤兩者的相互資訊，並使用該相互資訊來移除熵中與分類無關的資訊。接著，我們會考慮通過一組權重對於一個連續的隨機變數的影響，並計算輸出的熵總和。我們在三個資料集上使用網路Simplenet實現通道剪枝。這三個資料集分別為SVHN，CIFAR-10以及CIFAR-100。當參數百分比對所有捲積層設置為0.3時，我們提出的權重相互資訊方法會有比輸出L1這個方法分別對於SVHN，CIFAR-10和CIFAR-100高出1.52%，13.24%與7.90%的準確度。在全域剪枝的實驗中，我們提出的權重相互資訊分法在參數百分比為0.45時對於SVHN資料集會比輸出L1方法高出2%的準確度。當參數百分比為0.53時，權重相互資訊方法對應CIFAR-100資料集相較輸出L1方法高出1.5%的準確度。唯一的例外是在對應CIFAR-10資料集時，權重相互資訊這個方法在參數百分比為0.40時，較輸出L1方法少了5%的準確度。但同樣情況下，利用高斯分布估計出的熵方法卻出乎意料的高出輸出L1方法約51%的準確度。

關鍵字

網路壓縮；濾波器剪枝；熵；捲積神經網路；機器學習

並列摘要

Convolutional neural network (CNN) achieves great success especially in computer vision tasks. However, due to the limitation of hardware/software resources, model compression is an important technique to make CNNs smaller and even faster to train or inference. We focus on channel pruning, which is a part of model compression, that evaluates the importance of each channel in a convolution layer and prune away the less importance channels. In this paper, we propose the weighted mutual information metric which outperforms l1-norm pruning metric and other entropy metrics. We first compute the mutual information between feature maps and labels in order to remove information which is not relevant to classification task. We then consider the effect of passing a continuous random variable through filter weights and estimate the output entropy. We perform channel pruning on three datasets, SVHN, CIFAR-10 and CIFAR-100 using the model Simplenet. When parameter percentage is 0.3 for all convolution layers, our weighted mutual information method has 1.52%, 13.24% and 7.90% more accuracy than the output L1 metric on SVHN dataset, CIFAR-10 dataset and CIFAR-100 dataset. In the global pruning experiment, our weighted mutual information metric has about 2% more accuracy than output L1 metric when parameter ratio is about 0.45 on SVHN dataset. On CIFAR-100 dataset, our metric has about 1.5% more accuracy than output L1 metric when parameter ratio is about 0.53. The only exception is CIFAR-10 dataset, where our metric has about 5% worse than output L1 metric when parameter ratio is about 0.40. The entropy metric estimated according to Gaussian distribution unexpectedly outperforms output L1 metric by about 51% under same condition.

並列關鍵字

model compression ； filter pruning ； entropy ； CNN ； machine learning

參考文獻

1. Z. Cai and N. Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018

Google Scholar

2. A. Chaddad, B. Naisiri, M. Pedersoli, E. Granger, C. Desrosiers, and M. Toews. Modeling information flow through deep neural networks. arXiv preprint arXiv:1712.00003, 2017

Google Scholar

3. Y. Gong, L. Liu, M. Yang, and L. Bourdev. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115, 2014

Google Scholar

4. S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143, 2015

Google Scholar

5. S. H. Hasanpour, M. Rouhani, M. Fayyaz, and M. Sabokrou. Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. arXiv preprint arXiv:1608.06037, 2016

Google Scholar

國際替代計量

利用資料熵進行神經網路之壓縮

全文下載

主題瀏覽