透過您的圖書館登入
IP:3.144.161.92
  • 學位論文

基於非揮發性記憶體系統之高效能類神經網路的模糊運算策略

Achieving High-performance Neural Networks on NVM-based System by Exploiting Approximate Computing

指導教授 : 郭大維
共同指導教授 : 張原豪(Yuan-Hao Chang)
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在傳統計算架構下,類神經網路受到資料大小及效能的嚴格限制,傳統以動態隨機存取記憶體為主的系統遭遇許多問題,包含製程微縮困難、不足的容量空間以及漏電問題。雖然非揮發性記憶體能夠成為解決空間不足的潛在解決方案,其仍將面對到效能問題,特別是源於非對稱性讀寫效能的問題。此外,在非揮發性記憶體能夠真正於現實中被使用於類神經網路應用前,其他的重要疑慮(例如可靠度與耐久度)皆仍待解決。本篇論文將針對運行類神經網路於非揮發性記憶體系統上之設計議題,提出不同觀點的解決方法;明確地說,我們善用類神經網路中模糊運算之特性,並一同將非揮發性記憶體中獨特的特性與操作納入設計考量,旨在實現基於非揮發性記憶體系統之高效能類神經網路。本論文中之第一部分利用有損失性之寫入操作,藉以模糊地寫入中介資料與權重,旨在解決訓練期的記憶體空間與效能需求;具體而言,本論文將「資料流」和「資料內容」之分析以及類神經網路特性納入考量,並利用「雙重設置操作」,進而提出「資料警覺寫入設計」。本論文中之第二部分旨在解決推論期的效能及語音質量需求,進而提出並利用「資料重塑與量化方法」,藉以實現基於非揮發性記憶體加速器之類比乘加浮點數運算;本論文所提出之資料重塑與量化方法,藉由重塑類神經網路模型中之權重與偏誤值,進而解決縱橫式加速器上之加總電流不精準問題。本論文中之第三部分利用「1.5位元多層單元三維快閃記憶體之智能詢問處理引擎」,以解決推論期的效能與精準度需求,進而實現基於非揮發性記憶體加速器之數位乘加運算;準確而言,本論文將智能詢問中的模糊運算特性納入考量,並善用三維快閃記憶體之內建操作,進而提出一套「模糊1.5位元多層單元三維快閃記憶體之智能詢問處理引擎設計」。為了評估本論文所提出設計之能力,因而進行一系列之實驗,最終取得令人激賞的結果。

並列摘要


Neural networks over conventional computing platforms are heavily restricted by the data volume and performance concerns. The conventional DRAM-based system encounters multiple issues including scaling difficulty, insufficient capacity and leakage power. While non-volatile memory (NVM) offers potential solutions to data volume issues, challenges must be faced over performance, especially with asymmetric read and write performance. Beside that, other critical concerns such as reliability and endurance must also be resolved before non-volatile memory could be used in reality for neural networks. This dissertation proposes methodologies from different view points to resolve design issues caused by running neural networks over NVM-based systems. Specifically, we leverage the concept of approximate computing in neural networks with taking the unique characteristics and operations of NVMs into design considerations, and aim to achieve high-performance neural networks over NVM-based systems. The first part of this dissertation aims to solve the memory capacity and performance needs in the training phase by exploiting lossy programming to approximately write intermediate data and weights. Specifically, the proposed data-aware programming (DAP) design exploits Dual-SET operations in consideration of the data-flow and data-content analysis and neural network characteristics. The second part of this dissertation aims to solve the performance and speech quality needs in the inference phase by exploiting quantization with data reshaping approaches to enable analog-based MAC with floating-point numbers on NVM accelerators. The proposed quantization with data reshaping approaches resolve the inaccuracy issue in current summation over crossbar by reshaping weights and biases of neural network models. The third part of this dissertation aims to solve the performance and accuracy needs in the inference phase by exploiting a 1.5-bit MLC 3D-NAND-based Intelligent Query Processing Engine (IQPE) to enable digital-based MAC on NVM accelerators. Specifically, the proposed approximate 1.5-bit MLC 3D-NAND-based IQPE design wisely exploits 3D NAND built-in operations in consideration of approximate computing characteristics in intelligent queries. A series of experiments was conducted to evaluate the capability of our proposed designs, for which we have encouraging results.

參考文獻


[1] A. Akel, A. M. Caulfield, T. I. Mollov, R. K. Gupta, and S. Swanson. Onyx: A prototype phase change memory storage array. HotStorage, 1:1, 2011.
[2] Amir Ban. Flash File System. US Patent 5,404,485. In M-Systems, April 1995.
[3] Stephen Bates. Using rate-adaptive LDPC codes to maximize the capacity of ssds. In in Flash Memory Summit 2013, 2013.
[4] M. Bavandpour, S. Sahay, M. R. Mahmoodi, and D. Strukov. Efficient Mixed- Signal Neurocomputing Via Successive Integration and Rescaling. IEEE Transac- tions on Very Large Scale Integration (VLSI) Systems, 28(3):823–827, 2020.
[5] S. Bianco, R. Cadene, L. Celona, and P. Napoletano. Benchmark analysis of repre- sentative deep neural network architectures. IEEE Access, 6:64270–64277, 2018.

延伸閱讀