基於非揮發記憶體之卷積神經網路加速器的展開方法

近期有許多研究針對用於卷積神經網絡（CNN）的加速器，以得到比目前范紐曼（Von-Neumann）架構更佳的性能。隨著計算能力的不斷增長且記憶體速度的增長相對較慢，運算單元和記憶體設備之間的資料搬移已成為限制這些加速器性能的瓶頸。為了減少資料的搬移，Processing-In-Memory（PIM）架構被許多論文研究並提倡。然而，PIM 架構只能減少晶片外與晶片之間資料的搬移。晶片內部暫存空間與運算單元的搬移仍然存在，為了進一步減少該區段資料的搬移，此篇論文提出了一種卷積核（Kernel）展開方法，用晶片計算能力換取輸入特徵圖片（Input Feature Map）資料的利用效率，以減少輸入資料的搬移量，而不會重複輸入相同的資料。此篇論文提出的方法分別在時脈週期的減少、執行時間降低，以及輸入資料量減少的方面，分別得到了 16.2 倍，1.62倍和19.2 倍的改進程度。

關鍵字

非揮發記憶體；卷積神經網路；加速

並列摘要

A number of recent woks aimed to design accelerators for Convolutional Neural Networks (CNNs) to improve the performance of current Von-Neumann architecture. As computation power keeps growing and memory speed grows relatively slow, data movement between computing units and memory devices has become bottleneck that limit the performance of these accelerators. To eliminate the data movement, Processing-In-Memory (PIM) architecture is widely advocated. However, PIM can only decrease the data movement from off-chip. To further decrease the on-chip data movement, we propose a kernel-unfolding approach to trade off computation power for lower input data movement amount by fully utilize the input feature map data without overlapped data be input. The proposed approach yields improvements of 16.2×, 1.62× and 19.2× in cycle used, execution time, input data amount, respectively.

並列關鍵字

Nonvalitile ； NVM ； Convolutional Neural Network ； CNN ； crossbar

參考文獻

[1] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229, 2013.

Google Scholar

[2] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. IEEE, 2013.

Google Scholar

[3] Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In ACM SIGARCH Computer Architecture News, volume 44, pages 27–39. IEEE Press, 2016.

Google Scholar

[4] Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1):127–138, 2016.

Google Scholar

[5] Meng-Yao Lin, Hsiang-Yun Cheng, Wei-Ting Lin, Tzu-Hsien Yang, I-Ching Tseng, Chia-Lin Yang, Han-Wen Hu, Hung-Sheng Chang, Hsiang-Pang Li, and MengFan Chang. Dl-rsim: A simulation framework to enable reliable reram-based accelerators for deep learning. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8. IEEE, 2018.

Google Scholar

國際替代計量

基於非揮發記憶體之卷積神經網路加速器的展開方法

未授權

主題瀏覽