特徵圖之壓縮格式及直接索引設計

本論文為了解決直接索引技術高操作頻率的Decoder電路，並且進一步減少索引值的存儲(Offset)，設計了新的再壓縮格式可有效減少索引值儲存的大小，並提出對應的電路改良與閘控(gating)的方法，與一般直接索引電路相比有更低的功耗與操作次數需求。因為特徵圖的圖像連續性的特性，稀疏或密集同質的像素間高機率會相鄰。為了讓同性質的像素共用offset來降低稀疏壓縮所需的儲存資源，我們採用塊壓縮技術進行再壓縮減少儲存的offset位元數。由於主動稀疏權重會影響到卷積模型確率，且量化(quantization)後的權重零值比率也不高，我們的壓縮目標僅為激活值。電路上，我們提出改良的壓縮電路，利用壓縮格式索引值可共用的特性來閘控Decoder電路，藉此減少實現稀疏壓縮所需的功耗。從實驗數據中可以看出，我們的方法可有效減少索引值儲存的位元數，且在進行推理時整體編碼器/解碼器電路的功耗也很大程度的降低了。

關鍵字

稀疏壓縮；特徵圖；塊壓縮； DNN加速器

並列摘要

In this thesis, we try to solve the high operating frequency problem of decoder circuit of direct indexing module and reduce the cost of offset storage. We design a new recompression format to effectively reduce the size of offset storage and propose a gating method for the corresponding decoder circuit. Compared with the general direct indexing circuit, the proposed approach achieves lower power consumption and operation times. Because of the image continuity of feature maps, sparse or dense pixels in same channel have a high probability in near regions. In order to allow pixels in the same channel to share offsets to reduce the storage resources required for compression, we use block compression technology to recompress and reduce the number of bits stored. Since active sparse weights affect convolution model accuracy and the quantified weight zero ratio is not high, our compression target is only to activation values. On the hardware, we propose a compression circuit, using our compression format shared offset to gate the decoder circuit, thereby reducing the power consumption required to achieve sparse compression. Experimental results consistently show that our method can effectively reduce the number of bits stored for offset, and the encoder/ decoder circuit power consumption is also greatly reduced when inferencing.

並列關鍵字

Sparse Compression ； Feature Maps ； Block Compression ； DNN Accelerator

參考文獻

[1] S. Zhang and Z. Du, et al., “Cambricon-X: An Accelerator for Sparse Neural Networks”, Proc. of IEEE International Symposium on Microarchitecture (MICRO), pp. 1-12, 2016.

Google Scholar

[2] N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, 2008.

Google Scholar

[3] Yu-Hsin Chen, T. Krishna, J. S. Emer, V. Sze. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks”, IEEE Journal of Solid-State Circuits 2017.

Google Scholar

[4] I-Chen Wu, Po-Tsang Huang, Chin-Yang Lo, Wei Hwang. “An Energy-Efficient Accelerator with Relative- Indexing Memory for Sparse Compressed Convolutional Neural Network”, Artificial Intelligence Circuits and Systems (AICAS) 2019.

Google Scholar

[5] S.Liu, W.Deng, “Very deep convolutional neural network based image classification using small training sample size”, IAPR Asian Conference on Pattern Recognition (ACPR), 2015.

Google Scholar

國際替代計量

特徵圖之壓縮格式及直接索引設計

不提供下載

主題瀏覽