卷積神經網路(Convolution Neural Network)於近年的發展中展現了其潛力,隨著CNN的加深其權重數量也大量上升,使得將訓練好的CNN模型移植到嵌入式系統的難度開始提升,因此也開始出現針對減少權重數量的研究。本文運用軟硬體協同的方式,除了在硬體上實踐以循環矩陣方式替代全連接層原本的權重使儲存負擔降低以外,還提出了整流線性單位函數(Rectified Linear Unit,ReLU)後的運算值為0的情況下,將其排除於運算之外,以減少整體的運算量的構想。且相較於一般將一份新樣本送入到完成推論(inference)才進行下一份樣本,本文配合軟體端程式將不同組的Input data進行組合使其同時進行運算,也可使得權重能夠被有效利用,同時使用直接記憶體存取(Direct Memory Access,DMA)的方式將資料進行傳輸降低傳輸花費時間。實驗結果顯示在使用本文提出的架構下,較一般的方式可以節省約40%左右的時間,而GOPS數值為3.2
In recent years, Convolution Neural Network(CNN) has shown its potential. With the deepen of CNN, the number of weights has also increased significantly. It makes difficulty of transplanting pre-trained CNN model to embedded systems begin to increase. Therefore, researches on reducing the number of weights have begun to appear. We use software-hardware codesign to development. In addition to the achieve of replacing the original weight of the fully connected layer with circular matrix in hardware the storage burden is reduced. We also proposed the conceive that if the calculated value after Rectified Linear Unit (ReLU) is 0, it is excluded from the calculation to reduce the overall amount of calculation. Compared to generally sending a next sample after the new sample inference is completed, we use the software-side program to combine different sets of input data to make them operate at the same time. It also makes the weights to be used effectively. At the same time, we use Direct Memory Access(DMA) to transfer data to reduce the time it takes to transfer. Experimental results show that using the architecture which we proposed, can save about 40% of the time compared with the general method, and the GOPS value of our method is 3.2 .