應用於二值化卷積神經網路之高效率硬體加速器

二元神經網絡是近來這個時代的一個主題，它日益改進，以提高計算機視覺的使用，如識別，物體檢測，深度感知等。但是，大多數現有設計的硬體利用率低或過於複雜。導致電路的硬體成本過高。另外，BNN推斷中仍然存在大量的計算冗餘。因此，為了克服所有這些問題，如硬體利用率和計算複雜性問題，這種設計採用了收縮陣列架構，且採用二進制的輸入和權重。由於權重和激活可以存儲為單個比特，即+1存儲為1，並且-1存儲為0，因此大大減少了計算複雜性。此外，當通過逐位運算替換MAC操作時，解決了計算問題。在該設計中，使用8個PE，並且每個PE與每個累加器並行處理，其中在每個PE塊中使用了3x3內核大小的卷積。吞吐量增加，工作頻率最高可到188.67 MHz，最低為125 MHz。我們的結果顯示有8個PE，我們的設計能達到且也支持12.85 GOPS，與其他結果相比，面積效率提高了10倍。模擬RTL合成後的功耗為14mW。該架構使用Spartan 6系列FPGA在Xilinx ISE 14.7中成功實現。與其他最先進的工作相比，這種設計有更好的面積和帶寬效率。

關鍵字

Machine learning ； FPGA ； Convolution Neural Network ； Binary Neural Network ； Accelerator

並列摘要

Binary Neural Network is such a topic in this recent era that it is improving day by day to improve the use in computer vision such as recognition, object detection, depth perception etc. However, most of the existing designs suffer from low hardware utilization or complex circuits that result in high hardware cost. In addition, a large amount of computation redundancy still exists in BNN inference. Therefore, to overcome all these issues like hardware utilization and the problem of computational complexity, this design has adopted systolic array architecture, which takes binarized inputs and weights. It is drastically reduced since weights and activations can be stored as single bit i.e., +1 is stored as 1, and -1 is stored as 0. In addition, the problem of computational is solved when it replaced the MAC operations by bitwise operations. In this design, eight PEs are used and each PE is parallel processed with each accumulator where convolution 3x3 kernel size is filtered in each PE block. The throughput is increased and operating frequency at maximum of 188.67 MHz and minimum at 125 MHz. Our results shown with eight PEs , the design achieves and support 63.168 GOPS, which is 10x more area efficient with other results. The power consumption after simulating the RTL synthesis is 0.014W. The architecture implemented successfully in Xilinx ISE 14.7 using the Spartan 6 series FPGA. This design also shows better area and bandwidth efficiency compared to the other state-of-the-art works.

並列關鍵字

Machine learning ； FPGA ； Convolution Neural Network ； Binary Neural Network ； Accelerator

參考文獻

[1] D. Tomè, F. Monti, L. Baroffio, L. Bondi, M. Tagliasacchi, and S. Tubaro, “Deep convolutional neural networks for pedestrian detection,” 2015.

Google Scholar

[2] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks,” 2013.

Google Scholar

[3] R. M. Krauss and A. V. Nichols, “Metabolic Interrelationships of HDL Subclasses,” Lipoprotein Defic. Syndr., pp. 17–27, 2012.

Google Scholar

[4] T. He, W. Huang, Y. Qiao, and J. Yao, “Text-Attentional Convolutional Neural Network for Scene Text Detection,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2529–2541, 2016.

Google Scholar

[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Dl-物体検出01-2_2014_R-Cnn(Cvpr),” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 580–587, 2014.

Google Scholar

國際替代計量

應用於二值化卷積神經網路之高效率硬體加速器

查找全文

主題瀏覽