可適性乘積量化方法用於有效深度學習模型壓縮

在這篇論文中，我們提出了一種針對神經網絡壓縮的泛化乘積量化算法。相較於純量量化，乘積量化具有潛力達到極高的壓縮率。然而乘積量化存在區塊大小的限制，對於在給定記憶體空間內找到合適的量化參數構成了挑戰。為了克服這個限制，我們提出可適性補值，使得乘積量化可以使用任意大小的區塊，讓模型壓縮過程更靈活。可適性補值方法與以往基於最佳化方法的乘積量化是獨立的作法。此外，我們採用了一種簡單的方法來確定模型中各層的合適區塊大小，以達成更好的量化結果。實驗結果表明，我們的方法可以泛化乘積量化而不會明顯影響準確率，並能與以往的做法結合達到有效的提升表現。

關鍵字

模型壓縮；乘積量化

並列摘要

In this thesis, we propose a generalized product quantization (PQ) algorithm for neural network compression. Compared to scalar quantization, PQ offers the potential to achieve an extremely high compression rate. However, the block size constraints pose a challenge in finding an appropriate quantization configuration under a restricted storage budget. To overcome this limitation, we propose an algorithm, adaptive padding, which enables PQ to be applied to arbitrary block sizes and makes the compression rate of a quantized model more flexible. Adaptive padding is orthogonal to previous PQ approaches which focus on better optimization. Moreover, we employ a simple approach to determine suitable block sizes for each layer. Experimental results demonstrate that our method can generalize PQ without additional accuracy drops and can effectively enhance the performances when incorporated with existing PQ works.

並列關鍵字

Model compression ； Product quantization

參考文獻

[1] Saeed Anwar and Nick Barnes. Real image denoising with feature attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.

Google Scholar

[2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.

Google Scholar

[3] Zhen Dong, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.

Google Scholar

[4] Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference, 2021.

Google Scholar

[5] Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. Compressing deep convolutional networks using vector quantization, 2014.

Google Scholar

國際替代計量

可適性乘積量化方法用於有效深度學習模型壓縮

主題瀏覽