以可分離濾波器增進卷積神經網路的效能

卷積神經網路是深度學習架構的一種,因為它對於圖形辨識有很好的效果, 所以目前研究界將焦點放在它身上, 但是它的訓練過程極為緩慢,即使是擁有強大計算能力的GPU也需要幾天的時間才能訓練完成, 這樣便限制了它的應用層面。在這篇論文中,我們提出以可分離濾波器來提高卷積神經網路的速度, 首先,在卷積神經網路中的二維濾波器用SVD來分解近似,並得到兩個一維的濾波器, 其次,用這兩個一維濾波器來進行一維卷積並代替原本的二維卷積, 這樣可以有效減少計算量。在GPU實作中,我們實作了一個批次的SVD,它可以同時處理很多的小矩陣SVD, 此外,我們提出三種不同的方法來計算卷積, 這些方法會根據濾波器的大小來使用不同的記憶體,以便提高計算效率。結果顯示,在前向傳導和後向傳導中,我們的可以得到1.38x ∼ 2.66x的加速, 而以整體的訓練速度來看,我們得到13%的速度提升,但是準確度會下降1%。

關鍵字

通用圖形處理器；卷積神經網路；深度學習；可分離濾波器

並列摘要

Convolutional neural networks are one of the most widely used deep architectures in machine learning. While they achieve superior performance of recognition especially for images, the training remains a computational challenge which prevents them from practical uses. Even for GPUs possessing great computational power might take days to produce results. In this thesis, we propose a method based on separable filters to reduce the train- ing time. First, by using SVDs, the 2D filters in the convolutional neural networks are approximated by the product of two 1D filters. Second, two 1D convolutions are per- formed with the previous 1D filters. In our GPU implementation, a batched SVDs that can compute multiple small matrices simultaneously, and 3 methods which use different memory spaces according to the filter size are presented. Our experiment results shown that 1.38x ∼ 2.66x speedup was achieved in the for- ward and the backward pass. The overall training time could be reduced by 13% with 1% drop in the recognition accuracy.

並列關鍵字

GPU ； convolutional neural networks ； deep learning ； separable filters

參考文獻

[1][2][3][4][5][6][7][8][9][10][11][12]Emmanuel Agullo et al. “Numerical linear algebra on emerging architectures:

Yoshua Bengio. “Learning deep architectures for AI”. Foundations and trends R

in Machine Learning 2.1 (2009), pp. 1–127.

architectures”. Neural Networks: Tricks of the Trade. Springer, 2012, pp. 437–

Yoshua Bengio, Aaron Courville, and Pascal Vincent. “Representation learning:

國際替代計量

以可分離濾波器增進卷積神經網路的效能

全文下載

主題瀏覽