透過您的圖書館登入
IP:18.216.190.167
  • 學位論文

基於非揮發記憶體之卷積神經網路加速器的展開方法

A Kernel Unfolding Approach over NVM Crossbar Accelerators for Convolutional Neural Networks

指導教授 : 郭大維
共同指導教授 : 張原豪
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近期有許多研究針對用於卷積神經網絡(CNN)的加速器,以得到比目前范紐曼(Von-Neumann)架構更佳的性能。隨著計算能力的不斷增長且記憶體速度的增長相對較慢,運算單元和記憶體設備之間的資料搬移已成為限制這些 加速器性能的瓶頸。為了減少資料的搬移,Processing-In-Memory(PIM)架構被許多論文研究並提倡。然而,PIM 架構只能減少晶片外與晶片之間資料的搬移。晶片內部暫存空間與運算單元的搬移仍然存在,為了進一步減少該區段資料的搬移,此篇論文提出了一種卷積核(Kernel)展開方法,用晶片計算能力換取輸入特徵圖片(Input Feature Map)資料的利用效率,以減少輸入資料的搬移量,而不會重複輸入相同的資料。此篇論文提出的方法分別在時脈週期的減少、執行時間降低,以及輸入資料量減少的方面,分別得到了 16.2 倍,1.62倍和19.2 倍的改進程度。

並列摘要


A number of recent woks aimed to design accelerators for Convolutional Neural Networks (CNNs) to improve the performance of current Von-Neumann architecture. As computation power keeps growing and memory speed grows relatively slow, data movement between computing units and memory devices has become bottleneck that limit the performance of these accelerators. To eliminate the data movement, Processing-In-Memory (PIM) architecture is widely advocated. However, PIM can only decrease the data movement from off-chip. To further decrease the on-chip data movement, we propose a kernel-unfolding approach to trade off computation power for lower input data movement amount by fully utilize the input feature map data without overlapped data be input. The proposed approach yields improvements of 16.2×, 1.62× and 19.2× in cycle used, execution time, input data amount, respectively.

並列關鍵字

Nonvalitile NVM Convolutional Neural Network CNN crossbar

參考文獻


[1] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229, 2013.
[2] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. IEEE, 2013.
[3] Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In ACM SIGARCH Computer Architecture News, volume 44, pages 27–39. IEEE Press, 2016.
[4] Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1):127–138, 2016.
[5] Meng-Yao Lin, Hsiang-Yun Cheng, Wei-Ting Lin, Tzu-Hsien Yang, I-Ching Tseng, Chia-Lin Yang, Han-Wen Hu, Hung-Sheng Chang, Hsiang-Pang Li, and MengFan Chang. Dl-rsim: A simulation framework to enable reliable reram-based accelerators for deep learning. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8. IEEE, 2018.

延伸閱讀