針對卷積神經網路的展開式加速器架構與軟硬體協同設計

於本研究中，我們將提出一種硬體架構，專門針對卷積神經網路(Convolutional Neural Network, CNN)中卷積運算部分進行加速，並配合軟體進行整合溝通及驗證，確定整體系統運作正常。於設計階段，我們將採用軟硬體協同設計方式，針對軟體及硬體各自有不同的開發流程，並於最後進行整合，於軟體設計中，我們將導入定點數的架構，降低硬體運算的負擔，並測量合適的訊號量化雜訊比(Signal-to-Quantization-Noise Ratio ,SQNR)大小，來訂定我們的硬體規格。最後，我們會將此架構實作於EGO-X27開發板上，該開發板使用Xilinx Zynq-7000(型號XC7Z020-CLG484-1)系統晶片(System on-chip)架構，於該架構下，包含一個ARM的CPU作為PS端提供軟體使用，對資料進行前處理及對硬體進控制，也包含一個FPGA晶片作為我們PL端，供我們利用硬體描述語言(Hardware description language,HDL)來實現我們的硬體架構。本架構實現於50MHz，61KMACs的網路架構下，得到16.1GOPS的產量，使用了161個DSP模組，並與單組卷積運算加速器進行比較，可得到顯著的提升，於連續圖片輸入下，可利用此架構進行管線化的運作，使得輸出能達到pixel rate的效能。

關鍵字

卷積神經網路；硬體加速；軟硬體協同設計； FPGA

並列摘要

In this research, we introduce a hardware architecture, which is using for accelerating Convolution calculation in convolution neural networks (CNNs), communicate with our software system and verify with it, ensure the whole system works correctly. In design stage, we have mainly take the way of software-hardware codesign method. We have two different way of the develop flow for software and hardware system. We use fixed-point arithmetic on our software design to reduce the burden of hardware operations, measure the appropriate Signal-to-Quantization-Noise Ratio (SQNR) to formulate our hardware specification. We also implement the proposed architecture on EGO-X27 evaluation board that features a Xilinx Zynq-7000(XC7Z020-CLG484-1) system on-chip (SoC) architecture. Belong to this architecture, we have an ARM Cortex-A9 CPU as our process system (PS), which is using for data pre-processing and hardware controlling. A FPGA chip as our programmable logic (PL), let us can implement our hardware architecture with the hardware description language (HDL).The proposed architecture runs at a frequency of 150MHz with a 61KMACs’s Convolution neural network, and gets 16.1 GOPS of throughput, uses 161 DSP modules. Compared with single convolutional accelerator, our proposed made significant progress. When we dealing with consecutive input, our architecture make the output can reach the performance of pixel rate.

並列關鍵字

Convolution neural networks ； hardware accelerator ； Software-hardware codesign ； FPGA

參考文獻

[1] R. Collobert and J.Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning.”Proceedings of the 25 international conference on Machine learning.ACM,2008.

Google Scholar

[2] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, Clara I. Sánchez,” A survey on deep learning in medical image analysis.” Medical Image Analysis 42(2017):60-88.

Google Scholar

[3] Marvin Minsky and Seymour Papert, “Perceptrons: an introduction to computational geometry”. 1969.

Google Scholar

[4] Christopher M. Bishop, “Pattern Recognition and Machine Learning”, Springer Science+Buisness Media, LLC, 2006.

Google Scholar

[5] Yann LeCun, L_eon Bottou, Yoshua Bengio, and Patrick Ha_ner, “Gradient-Based Learning Applied to Document Recognition”, in Proc. IEEE Int. Conf. on Computer Vision, 1998.

Google Scholar

國際替代計量

針對卷積神經網路的展開式加速器架構與軟硬體協同設計

全文下載

主題瀏覽