考慮硬體行為之基於電阻式記憶體的神經網路加速器高階設計框架

使用神經網路加速器對神經網路運算進行加速是一個趨勢，但是傳統的神經網路加速器為基於凡紐曼架構所設計，因此頻繁的資料搬移及資料傳輸頻寬造成很大的效能瓶頸．記憶體內運算為冉冉升起的加速器架構，尤其是基於電阻式記憶體的神經網路加速器已經被證明可以解決凡紐曼架構所帶來的效能瓶頸．為了能夠加速神經網路加速器的設計時間，一個完整的設計框架是不可或缺的．因此，在本論文中，我們提出一個基於電阻式記憶體的神經網路設計框架，在此框架中我們有以下幾個特點．軟體部分，我們提出了演算法來找出神經網路最佳優化資料流並整合盡我們所提出的神經網路架構，我們的編譯器基於開源神經網路編譯器TVM，以下是我們的特點，第一，我們使用分割 (Partition) 網路模型的方法，讓模型可以進行平行化運算減少運算時間並增加效能，第二，基於神經網路加速器上儲存空間有限的問題，我們開發出優化演算法並將模型，並根據所找出的最佳解進行分割與融合達到資源利用最大化．硬體部分，針對基於電阻式記憶體所建構之神經網路加速器，第一，我們將電阻式記憶體的非線性特性納入考慮，包含了對數常態分佈、漏電流、壓降效應、淺行路徑，因此，模型的精準度及效能可以準確的運算，第二，我們使用systemＣ並利用TLM的建模方法建構我們的硬體虛擬平台，據我們所知，我們為第一個提出設計框架基於電阻式記憶體神經網路加速器的模擬器來模擬真實硬體行為，第三，我們所提出的設計框架可以同時模擬精準度及硬體效能，因此我們可以使用此模擬器來做硬體架構探索．實驗結果中，我們使用了不同的深度神經網路測試我們所提出的設計框架，實驗結果顯示，我們所提出的方法能夠分割神經網路並產生有效率的設計流，此設計流能夠改善神經網路加速器硬體儲存空間不夠所造成的效能瓶頸，另外硬體效能可以非常快速的在我們所提出的虛擬平台上評估，更進一步，實驗結果顯示，不同的硬體加速器架構會帶來不同的非線性影響，因此，使用我們所提出的設計框架可以經鬆調整不同的硬體架構參數來尋找最佳的神經網路加速器架構並緩解非線性效應對神經網路模型所帶來的影響．

關鍵字

神經網路加速器；資料流；記憶體內運算；虛擬平台；模擬器

並列摘要

It is a trend to use DNN accelerators to speed up neural network operations, but traditional neural network accelerators are designed based on Von Neumann architecture, so data transfer and data transmission bandwidth with high frequency cause a big performance bottleneck. In-memory computing is a rising accelerator architecture, especially neural network accelerators based on ReRAM have been proven to solve the performance bottleneck brought by the Von Neumann architecture. To speed up the design time of neural network accelerators, a complete design framework is indispensable. Therefore, this thesis proposes a ReRAM-based neural network design framework. In this framework, we have the following features. For the software part, we propose an algorithm to find the best-optimized data flow for the neural network and integrate the neural network architecture we proposed. Our compiler is based on the open-source DNN compiler TVM. The following are our characteristics, First, we use the partition method on the network model, so that the model can be parallelized to reduce computing time and increase performance. Second, we developed an optimization algorithm based on neural network accelerators to solve the problem of limited storage space. In our proposed method, the algorithm partition and fuse the model according to the best solution to maximize resource utilization. In the hardware section, for neural network accelerators based on resistive memory. First, we considered the nonlinear characteristics of the ReRAM memory, including leakage current, lognormal distribution, sneak path and IR-drop effect. As a result, the accuracy of the model and the performance of the model on the DNN accelerator can be evaluated precisely. Secondly, we build our proposed hardware virtual platform based on SystemC through the TLM modeling approach. As far as we know, our proposed ReRAM memory DNN accelerator design framework is the first simulator with a virtual platform that can not only simulate real memory features but also take system-level information into account. Third, we propose a design framework that simulates both precision and hardware performance, so we can use this simulator for hardware architecture exploration. We tested our proposed design framework using different deep neural networks in the results of our experiments. In the experiment results, we demonstrate that the proposed method can partition the neural network, generate efficient data flow, and solve the performance bottleneck caused by insufficient storage space of the accelerator hardware. In addition, hardware performance can be evaluated very quickly on the virtual platform we propose. Experimental results show that different hardware accelerator architectures pose different problems. Therefore, using the design framework we propose, we can adjust different hardware architecture parameters to find the best DNN accelerator architecture to mitigate the impact of nonlinear effects on the DNN model.

並列關鍵字

DNN Accelerator ； Dataflow ； Processing-in-memory ； Virtual Platform ； Simulator

參考文獻

References

Google Scholar

[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278– 2324, November 1998.

Google Scholar

[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet clas- sification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.

Google Scholar

[3] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

Google Scholar

[4] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Ra- binovich. Going deeper with convolutions. In Proceedings of the IEEE con- ference on computer vision and pattern recognition, pages 1–9, 2015.

Google Scholar

主題瀏覽