透過您的圖書館登入
IP:18.118.140.108
  • 學位論文

針對卷積神經網路的展開式加速器架構與軟硬體協同設計

Unfolded Accelerator Architecture and Software-Hardware Codesign for Convolutional Neural Networks

指導教授 : 湯松年

摘要


於本研究中,我們將提出一種硬體架構,專門針對卷積神經網路(Convolutional Neural Network, CNN)中卷積運算部分進行加速,並配合軟體進行整合溝通及驗證,確定整體系統運作正常。於設計階段,我們將採用軟硬體協同設計方式,針對軟體及硬體各自有不同的開發流程,並於最後進行整合,於軟體設計中,我們將導入定點數的架構,降低硬體運算的負擔,並測量合適的訊號量化雜訊比(Signal-to-Quantization-Noise Ratio ,SQNR)大小,來訂定我們的硬體規格。最後,我們會將此架構實作於EGO-X27開發板上,該開發板使用Xilinx Zynq-7000(型號XC7Z020-CLG484-1)系統晶片(System on-chip)架構,於該架構下,包含一個ARM的CPU作為PS端提供軟體使用,對資料進行前處理及對硬體進控制,也包含一個FPGA晶片作為我們PL端,供我們利用硬體描述語言(Hardware description language,HDL)來實現我們的硬體架構。本架構實現於50MHz,61KMACs的網路架構下,得到16.1GOPS的產量,使用了161個DSP模組,並與單組卷積運算加速器進行比較,可得到顯著的提升,於連續圖片輸入下,可利用此架構進行管線化的運作,使得輸出能達到pixel rate的效能。

並列摘要


In this research, we introduce a hardware architecture, which is using for accelerating Convolution calculation in convolution neural networks (CNNs), communicate with our software system and verify with it, ensure the whole system works correctly. In design stage, we have mainly take the way of software-hardware codesign method. We have two different way of the develop flow for software and hardware system. We use fixed-point arithmetic on our software design to reduce the burden of hardware operations, measure the appropriate Signal-to-Quantization-Noise Ratio (SQNR) to formulate our hardware specification. We also implement the proposed architecture on EGO-X27 evaluation board that features a Xilinx Zynq-7000(XC7Z020-CLG484-1) system on-chip (SoC) architecture. Belong to this architecture, we have an ARM Cortex-A9 CPU as our process system (PS), which is using for data pre-processing and hardware controlling. A FPGA chip as our programmable logic (PL), let us can implement our hardware architecture with the hardware description language (HDL).The proposed architecture runs at a frequency of 150MHz with a 61KMACs’s Convolution neural network, and gets 16.1 GOPS of throughput, uses 161 DSP modules. Compared with single convolutional accelerator, our proposed made significant progress. When we dealing with consecutive input, our architecture make the output can reach the performance of pixel rate.

參考文獻


[1] R. Collobert and J.Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning.”Proceedings of the 25 international conference on Machine learning.ACM,2008.
[2] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, Clara I. Sánchez,” A survey on deep learning in medical image analysis.” Medical Image Analysis 42(2017):60-88.
[3] Marvin Minsky and Seymour Papert, “Perceptrons: an introduction to computational geometry”. 1969.
[4] Christopher M. Bishop, “Pattern Recognition and Machine Learning”, Springer Science+Buisness Media, LLC, 2006.
[5] Yann LeCun, L_eon Bottou, Yoshua Bengio, and Patrick Ha_ner, “Gradient-Based Learning Applied to Document Recognition”, in Proc. IEEE Int. Conf. on Computer Vision, 1998.

延伸閱讀