透過您的圖書館登入
IP:18.221.85.100
  • 學位論文

基於架構生成器及搜尋進行硬體平台感知之神經網路剪枝

Platform-aware Network Pruning with Architecture Generator and Search

指導教授 : 簡韶逸

摘要


卷積神經網絡廣泛用於計算機視覺。需要存儲的大量參數和高計算複雜度導致它們需要具有大量資源的平台來實現某些場景,例如實時應用。為了在資源有限的平台上適應這些計算密集型網絡,研究人員開發了許多網絡壓縮技術,以在減少資源消耗的同時保持性能。成功的壓縮方法的關鍵是在給定資源(例如,參數、推理延遲)約束下產生具有最高性能的網絡。 剪枝是一種有效的網絡壓縮方法。它估計每個濾波器的重要性並消除那些不太重要的過濾器,直到滿足資源限制。雖然現有方法僅考慮網絡中的參數總量或浮點運算 (FLOPs) 作為約束,但這些指標忽略了網絡如何在目標平台上執行。在本論文中,將推理延遲等平台特性引入到修剪指標中。我們提出了一種新的平台感知過濾器修剪方法,可以擴大整個網絡的搜索空間。稱為平台感知架構生成器和搜索(PAGS),它可以生成給定延遲約束的網絡架構並擴展整體模型架構搜索空間。在搜索階段,我們從生成器構建的候選集中搜索最佳修剪結構。最後,進行典型的剪枝程序將預訓練的模型剪枝為最佳剪枝結構和微調它以恢復性能。大量實驗表明,在相同的延遲約束下,我們的方法可以實現比最先進的方法更好的性能和更低的延遲。

並列摘要


Convolutional neural networks (CNNs) are widely used in computer vision. A large amount of parameters to be stored and the high computation complexity of CNNs leads to that they need platforms with plenty of resources to achieve some scenarios, e.g., real-time applications. To adapt these computation-intensive networks on platforms with limited resources, researchers have developed many network compression techniques to reduce resource consumption and maintain performance at the same time. The key to a successful compression method is to yield the network with the highest performance under the given resource (e.g., parameters, inference latency) constraints. Pruning is an effective method for network compression. It estimates the importance of each filter and eliminates those who are less critical until the constraints are met. While existing methods only consider the total number of parameters or floating-point operations (FLOPs) in a network as constraints, these metrics neglect how networks are executed on target platforms. In this thesis, platform characteristics such as inference latency are introduced to pruning metrics. We propose a novel method for platform-aware filter pruning that enlarges the total networks’ searching space. It is called Platform-aware Architecture Generator and Search (PAGS), which can generate network architectures given a latency constraint and expand the overall model architecture searching space, followed by the search algorithm. In the searching stage, we search for the best-pruned structure from the candidate set constructed by our generator. Lastly, typical pruning procedure will be conducted to prune the pre-trained model to the best-pruned structure and fine-tune it to recover performance. Extensive experiments demonstrate that under the same latency constraint, our method can achieve better performance and lower latency than state-of-the-art methods.

參考文獻


T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam, “Netadapt: Platform-aware neural network adaptation for mobile applications,” in Proceedings of European Conference on Computer Vision (ECCV), 2018, pp. 285–300.
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” arXiv preprint arXiv:1608.08710, 2016.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Proceedings of Neural Information Processing Systems (NIPS), 2015, pp. 91–99.
C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” arXiv preprint arXiv:1501.00092, 2015.

延伸閱讀