隨著人工智慧的快速發展,CNN經常被運用於各大人工智慧領域,如:機器學習、電腦視覺(Computer Vision)、計算神經科學(Computational Neuroscience)等。儘管卷積神經網路的功能非常強大,但伴隨而來的則是大量的卷積運算,在運算過程中,同時也會需要大量的記憶體傳輸。因此在卷積神經網路的硬體實現過程中,常運用Dataflow的規劃、Processing Elements (PE)與Buffer的配置來降低所需的記憶體傳輸與耗能。 在本論文中,我們將以Buffer的配置為主軸,提出一個最佳化記憶體傳輸之演算法。首先,將針對卷積神經網路中的架構,適當地挑選出需要調配的處理層,針對需調配之Layer靈活調動Buffer的配置方式,並運用SCALE-Sim進行模擬實驗[15]。 在SCALE-Sim模擬結果下,本論文提出之快速篩檢演算法(Rapid Test Algorithm)運用類似機器學習的技巧,學習前一組Network的特徵,將學習到的特徵運用於不同的網路架構中進行預測。快速篩檢演算法相較於在整個網路中僅使用單一種Buffer配置的一般方法,可有效降低約50%的記憶體傳輸,並且與暴力法(Brute Force)相比減少約85%的計算量。
With the rapid development of artificial intelligence, CNN is often used in various artificial intelligence fields, such as machine learning, computer vision, computational neuroscience, etc. Although the function of the convolutional neural network is powerful, it is accompanied by a large number of convolution operations, and a large amount of memory transmission is also required. Therefore, dataflow planning, processing Elements (PE) and buffer configuration are often used to reduce memory transmission and energy consumption in the hardware implementation. In this thesis, we take the configuration of the buffer as the main axis and propose an algorithm for optimizing memory transfer. According to the architecture of the convolutional neural network, the processing layer that needs to be deployed will be selected appropriately, the configuration of the buffer will be flexibly adjusted, and SCALE-Sim [15]will be used for simulation. Based on the results of SCALE-Sim, the proposed rapid test algorithm uses machine learning like technique to learn the feature sets of previous networks and apply the learned features to different network architectures. Compared with the method that using fixed buffer configuration in the entire network, the rapid test algorithm can effectively reduce the memory transfer by about 50%, and reduce the amount of calculation by about 85% compared with the Brute Force method.