使用近似激活函式之循環神經網路的軟硬體協同設計與實現

在科技日益進步的現代，神經網路運算藉由硬體平台的運算能力進步，逐漸讓人工智慧邊緣運算(Edge-AI)漸漸成為研究的主流。本研究對遞歸神經網路(Recurrent Neural Networks, RNN)中的變形：長短期記憶 (Long Short-Term Memory, LSTM) 模型實現一個深度學習應用的推論模型(Inference model)，並將運算中最常使用的MVM(Matrix-Vector Multiplication)運算，藉由軟硬體協同設計在SoC-FPGA上進行實體的電路模擬和驗證，以此來達成整個軟硬體的開發。神經網路模型中都有大量的權重參數(Weights)進行運算，然而在硬體有限的記憶體資源下，要怎麼讓這些資料有效的存取格外重要。本研究使用循環矩陣(Circulant Matrix)來取代LSTM模型中的權重參數矩陣(Weights Matrix)，能有效的將權重的空間複雜度從O(n2)優化至O(n)，在本論文中當LSTM模型的隱藏層(Hidden layers)在256、Block size在32的狀況下，能省下96.63%的儲存資源並與原始LSTM未使用循環矩陣的模型兩者的準確率相近。為了在未來中能減少硬體運算量，我們經由模擬硬體的運算並帶入近似激活函數，近似激活函數是針對LSTM中Tanh函數、Sigmoid函數進行優化希望能增加我們預期的輸出值，以此計算減少硬體運算器的數量。

關鍵字

長短期記憶；循環矩陣

並列摘要

In modern times where technology is advancing day by day, artificial intelligence edge computing has gradually become the mainstream of research through the advancement of the computing power of the hardware platform through Neural network computing. In this study, the deformation in the Recurrent Neural Networks(RNN)：Long Short-Term Memory (LSTM) model implements an Inference model for deep learning applications, and the most commonly used Matrix-Vector Multiplication(MVM) operation in the operation is implemented on the SoC-FPGA by software-hardware co-design Circuit simulation and verification, in order to achieve the development of the entire hardware and software. There are a large number of Weight parameters in the Neural network model for calculation. However, under the limited memory resources of the hardware, how to effectively access these data is particularly important. This study uses Circular Matrix to replace the Weight Matrix in the LSTM model, which can effectively optimize the space complexity from O(n2) to O(n). In this paper, when the Hidden layer of the LSTM model is 256, Block size is set to 32, it can save 96.63% of storage resources and is similar to the accuracy of the original LSTM model that does not use a Circular Matrix. In order to reduce the amount of hardware calculations in the future, we simulate hardware operations and bring in approximate activation functions. The approximate activation functions are optimized for the Tanh function and Sigmoid function in LSTM, hoping to increase our expected output value. Calculation reduces the number of hardware arithmetic units.

並列關鍵字

LSTM ； Circulant Matrix

參考文獻

[1] S. Menon, A. Damian, S. Hu, N. Ravi and C. Rudin, "PULSE：Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" in arXiv:2003.03808 [cs.CV], 2020.

Google Scholar

[2] J. Zhang and K. B. Letaief, "Mobile Edge Intelligence and Computing for the Internet of Vehicles," in arXiv:1906.00400 [cs.NI], 2019.

Google Scholar

[3] Y. Shi, K. Yang, T. Jiang, J. Zhang and K. B. Letaief, "Communication-Efficient Edge AI：Algorithms and Systems," in arXiv:2002.09668 [cs.IT], 2020.

Google Scholar

[4] A. Sherstinsky, "Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network," in arXiv:1808.03314 [cs.LG], 2018.

Google Scholar

[5] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," in Neural Computation 9(8):1735-80, 1997.

Google Scholar

主題瀏覽