優化脈衝神經網絡推理： 架構設計與算法增強的協同方法

作為人工智能領域的一項重大進展，脈衝神經網絡以其獨特且受大腦啟發的解題、推理和推斷方法，展現了巨大潛力。這種潛力促使眾多組織和企業投入於算法和硬件設計的開發，目標是應對並克服未來的技術挑戰。然而，現有的硬件和算法解決方案在仿真硬件的成本效益及深度神經模型的推理效率方面存在局限。本研究提出了一種旨在實現高成本效益和高效運作的硬件設計與算法改進。在硬體層面，該硬件使用計算型非揮發性記憶體，其特點是無需權重移動、低比特成本和提高的面積效率，這些特性共同提升了處理能力和成本效益。在算法層面，本研究開發了專門為脈衝神經網絡加速器設計的神經模型。為了實現此硬件與演算法的共同優化，該研究著重於三個部分：(1) 由於記憶體單元的不完美而導致的可靠性降低；(2) 非揮發性記憶體的大粒度與高延遲造成效能損失；以及 (3) 為實現準確的計算結果而需多次脈衝處理次數。針對圖像識別與解題應用，我們對各種記憶裝置進行了可靠性分析。此外，我提出了架構設計以減緩處理速度損失，並提出將神經模型轉化為二進制神經網絡，以微小的準確度的犧牲，換得顯著提升分類推理的能效。我們的研究結果指出，在校準具有小開關比和高信噪比的單元時，會有巨大的電容成本，為此應選用電晶體為主的記憶體。此外，我們的設計在MAX-CUT、數獨和LASSO任務上明顯優於以往的數位型SNN處理器，分別實現了3.1倍、1.8倍和2.2倍的加速。最後，與4位元SNN相比，我們整合的容錯二進制神經網絡不僅將電容大小減半，能源消耗減少了57％，同時也大幅降低了兩個數量級的延遲，並保持了分類的準確性。

關鍵字

脈衝神經網路；快閃記憶體；記憶體內運算；神經態運算；可靠度分析

並列摘要

As a significant advancement in the field of artificial intelligence, spiking neural networks showcase a unique, brain-inspired approach to computational problem-solving, reasoning, and inference. This notable potential drives various organizations and industries to dedicate efforts to the development of algorithms and hardware design, aiming to achieve a substantial leap in addressing and overcoming imminent technological challenges of the future. However, the existing hardware and algorithmic solutions demonstrate a lack of cost-effectiveness in terms of emulation hardware and inefficiency in the inference of deep neural models. This thesis presents a hardware design and algorithmic improvements aimed at achieving cost-effective and efficient operation. From a hardware perspective, the hardware utilizes computational non-volatile memory, characterized by its ability to operate without weight movement, lower bit costs, and improved area efficiency, which collectively enhances processing throughput at a reduced cost compared to prior architectures. On the algorithmic front, this study develops a neural model specially tailored for a spiking neural network accelerator. However, to fully implement this inference system, it is necessary to address three critical challenges: (1) the decline in reliability due to imperfections in memory cells, (2) the considerable size and extended latency associated with non-volatile memory macros, and (3) the requirement for multiple spiking processing cycles to achieve accurate computational outcomes. In response, we conduct a reliability analysis of various memory devices for image classification and optimization problem-solving. Additionally, we delve into architectural designs to counteract losses in processing throughput, especially when dealing with sparse inputs or a limited input degree of network model inference. We also propose transforming neural models into binary neural networks to substantially improve processing speed and energy efficiency with minor sacrifices in accuracy for image classification. Our results highlight the substantial capacitance expense incurred when calibrating cells with a minimal ON-OFF ratio and a high signal-to-noise ratio. Thus, transistor-based memory is suggested. The design we propose significantly outperforms previous digital-based SNN processors, achieving processing speeds that are 3.1x, 1.8x, and 2.2x faster for MAX-CUT, SUDOKU, and LASSO tasks, respectively. In addition, when compared to 4-bit SNNs, our approach of integrating error-resilient binary neural networks (ER-BNN) into the probabilistic inference machine not only cuts the capacitor size by 50% and reduces energy consumption by 57%, but also significantly decreases latency, all while preserving the accuracy of classification.

並列關鍵字

Spiking Neural Network ； Flash Memory ； Computing in Memory ； Neuromorphic Computing ； Reliability Analysis

參考文獻

[1] Iulia M Comsa, Krzysztof Potempa, Luca Versari, Thomas Fischbacher, Andrea Gesmundo, and Jyrki Alakuijala. Temporal coding in spiking neural networks with alpha synaptic function. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8529–8533. IEEE, 2020.

Google Scholar

[2] Filipp Akopyan, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John Arthur, Paul Merolla, Nabil Imam, Yutaka Nakamura, Pallab Datta, Gi-Joon Nam, Brian Taba, Michael Beakes, Bernard Brezzo, Jente B. Kuang, Rajit Manohar, William P. Risk, Bryan Jackson, and Dharmendra S. Modha. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10):1537–1557, 2015.

Google Scholar

[3] Ben Varkey Benjamin, Peiran Gao, Emmett McQuinn, Swadesh Choudhary, Anand R Chandrasekaran, Jean-Marie Bussat, Rodrigo Alvarez-Icaza, John V Arthur, Paul A Merolla, and Kwabena Boahen. Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations. Proceedings of the IEEE, 102(5):699–716, 2014.

Google Scholar

[4] Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, et al. Loihi: A neuromorphic manycore processor with on-chip learning. Ieee Micro, 38(1):82–99, 2018.

Google Scholar

[5] Andreas Baumbach, Sebastian Billaudelle, Virginie Sabado, and Mihai A Petrovici. Brainscales: Greater versatility for neuromorphic emulation. Brain-inspired Computing, page 15, 2021.

Google Scholar

國際替代計量

優化脈衝神經網絡推理：架構設計與算法增強的協同方法

全文下載

主題瀏覽