透過您的圖書館登入
IP:216.73.216.209
  • 學位論文

基於循環矩陣之神經網路的可重置加速器與軟硬體協同設計

A Reconfigurable Accelerator and Software-Hardware Co-design for Neural Networks Based on Circulant Matrix.

指導教授 : 湯松年

摘要


在人工智慧發展迅速的時代背景下,嵌入式人工智慧應用是大勢所趨, 為了提升人工智慧在嵌入式系統上推論與訓練的準確率,基於人工智慧的 深度學習展現出優異的表現,而深度學使用到的大量運算以及龐大的運算 參數量都成為嵌入式系統面臨的挑戰,在有限的處理器、記憶體和面積下如 何進行快速準確的推論乃當今熱門研究方向。本論文中使用循環矩陣取代 原先全連接層中的權重矩陣,對於佔了深度學習架構 90%以上參數量的全 連結層,這個方法可以在小幅減少準確率的情況下大幅減少參數量,使空間 複雜度從 O(n2)優化至 O(n),在記憶體節省上效果十分顯著。本論文實驗在 SoC FPGA 上針對使用循環矩陣的全連結層進行硬體加速,並配合軟體實現 基於卷積神經網路的圖像辨識。實驗顯示,在兩個常見的資料集上,辨識準 確率最多下降 0.8%的情況下,參數量減少了 99%,並且設計之硬體加速器 與開發平台的雙核心 ARM CortexTM-A9 處理器比較有 256 倍的速度提升。

並列摘要


Deep learning uses a lot of computation and a lot of parameters, which are the challenges of embedded systems, how to make quick and accurate inferences with limited resources is a hot research direction today. In this paper, we use the circulant matrix to replace the weight matrix in the fully connected layer,this method improves the space complexity from O(n2) to O(n). In our work,we design a hardware accelerator for the fully connected layer of the circulant matrix,and use software hardware co-design realize image recognition based on convolutional neural network.Experiments show that on the two standard datasets, when the inference accuracy drops up to 0.8%, the parameter is reduced by 99%,and the designed hardware accelerator is 256 times faster than the dual-core ARM CortexTM-A9 processor of the development platform.

參考文獻


[1] Krizhevsky, I. Sutskever, G. E. Hinton. “Imagenet classification with deep
convolutional neural networks.” Advances in neural information processing systems.
[2] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition.”
Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

延伸閱讀