HIGH-PERFORMANCE VLSI DESIGN FOR CONVOLUTION LAYER OF DEEP LEARNING NEURAL NETWORKS

In this paper, a high performance Deep Convolutional Neural Networks (DCNN) hardware architecture, composed of three major parts, is proposed. The first part is the Convolution Operation Unit (COU). It employs a Processing Element (PE) array to realize the high efficiency convolution operations. The second part is the COU Management. This management controls the PE Array and keeps the PEs working at the most efficient state. The third part is the Storage and Accumulation Unit (SAU). The tasks of SAU are storing and accumulating the partial sums that produced in convolution process. We implemented this design with TSMC 40nm General technology. And the experimental results show that our design provides 32.2 GOPS for AlexNet [1] in 200MHz clock rate, the total memory cost is 134k-byte. Compared with [4], we reduce the memory size by 26.17% as well as speed up the convolutional computing by 39.94% with lower hardware cost.

關鍵字

convolutional neural networks (CNN) ； deep learning ； CNN hardware accelerator

國際替代計量

全文下載

主題瀏覽

HIGH-PERFORMANCE VLSI DESIGN FOR CONVOLUTION LAYER OF DEEP LEARNING NEURAL NETWORKS

摘要

關鍵字

延伸閱讀

國際替代計量

本網站使用Cookies