In this paper, a high performance Deep Convolutional Neural Networks (DCNN) hardware architecture, composed of three major parts, is proposed. The first part is the Convolution Operation Unit (COU). It employs a Processing Element (PE) array to realize the high efficiency convolution operations. The second part is the COU Management. This management controls the PE Array and keeps the PEs working at the most efficient state. The third part is the Storage and Accumulation Unit (SAU). The tasks of SAU are storing and accumulating the partial sums that produced in convolution process. We implemented this design with TSMC 40nm General technology. And the experimental results show that our design provides 32.2 GOPS for AlexNet [1] in 200MHz clock rate, the total memory cost is 134k-byte. Compared with [4], we reduce the memory size by 26.17% as well as speed up the convolutional computing by 39.94% with lower hardware cost.