透過您的圖書館登入
IP:3.128.200.157
  • 學位論文

快速篩檢演算法:以彈性緩衝區配置優化神經網路的記憶體傳輸

Rapid Test Algorithm: Reducing Memory Traffic in CNNs by Using Flexible Buffer Configuration

指導教授 : 鄭維凱
本文將於2025/08/20開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


隨著人工智慧的快速發展,CNN經常被運用於各大人工智慧領域,如:機器學習、電腦視覺(Computer Vision)、計算神經科學(Computational Neuroscience)等。儘管卷積神經網路的功能非常強大,但伴隨而來的則是大量的卷積運算,在運算過程中,同時也會需要大量的記憶體傳輸。因此在卷積神經網路的硬體實現過程中,常運用Dataflow的規劃、Processing Elements (PE)與Buffer的配置來降低所需的記憶體傳輸與耗能。 在本論文中,我們將以Buffer的配置為主軸,提出一個最佳化記憶體傳輸之演算法。首先,將針對卷積神經網路中的架構,適當地挑選出需要調配的處理層,針對需調配之Layer靈活調動Buffer的配置方式,並運用SCALE-Sim進行模擬實驗[15]。 在SCALE-Sim模擬結果下,本論文提出之快速篩檢演算法(Rapid Test Algorithm)運用類似機器學習的技巧,學習前一組Network的特徵,將學習到的特徵運用於不同的網路架構中進行預測。快速篩檢演算法相較於在整個網路中僅使用單一種Buffer配置的一般方法,可有效降低約50%的記憶體傳輸,並且與暴力法(Brute Force)相比減少約85%的計算量。

並列摘要


With the rapid development of artificial intelligence, CNN is often used in various artificial intelligence fields, such as machine learning, computer vision, computational neuroscience, etc. Although the function of the convolutional neural network is powerful, it is accompanied by a large number of convolution operations, and a large amount of memory transmission is also required. Therefore, dataflow planning, processing Elements (PE) and buffer configuration are often used to reduce memory transmission and energy consumption in the hardware implementation. In this thesis, we take the configuration of the buffer as the main axis and propose an algorithm for optimizing memory transfer. According to the architecture of the convolutional neural network, the processing layer that needs to be deployed will be selected appropriately, the configuration of the buffer will be flexibly adjusted, and SCALE-Sim [15]will be used for simulation. Based on the results of SCALE-Sim, the proposed rapid test algorithm uses machine learning like technique to learn the feature sets of previous networks and apply the learned features to different network architectures. Compared with the method that using fixed buffer configuration in the entire network, the rapid test algorithm can effectively reduce the memory transfer by about 50%, and reduce the amount of calculation by about 85% compared with the Brute Force method.

參考文獻


[1] Amal AbuNaser, Iyad Abu Doush*, Nahed Mansour and Sawsan Alshattnawi,"Underwater Image Enhancement Using Particle Swarm Optimization,"Received February 6, 2014; previously published online August 7, 2014.,doi 10.1515/jisys-2014-0012.
[2] Y. Chen, J. Emer and V. Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, 2016, pp. 367-379, doi: 10.1109/ISCA.2016.40.
[3] Y. Chen, T. Krishna, J. Emer and V. Sze, "14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," 2016 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 262-263, doi: 10.1109/ISSCC.2016.7418007.
[4] L. Deng et al., "Recent advances in deep learning for speech research at Microsoft," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, 2013, pp. 8604-8608, doi: 10.1109/ICASSP.2013.6639345.
[5] R. C. Eberhart and Y. Shi, "Comparing inertia weights and constriction factors in particle swarm optimization," Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512), La Jolla, CA, USA, 2000, pp. 84-88 vol.1, doi: 10.1109/CEC.2000.870279.

延伸閱讀