適用於特定神經網路任務之可調節映射平行度的模組化加速器

AI在各個領域廣泛應用。為了應對具有數十億參數和快速演進架構的模型的複雜性，神經網絡模型和計算能力需要高度整合。儘管通用加速器使用複雜的網絡來適應模型變化，但特定任務的加速器提供了更好的解決方案。通過分析，我們發現神經網絡模型的變化是漸進和可預測的。我們提出了一種新的架構，將神經網絡模型劃分為具有相似計算特性的子集。通過將這些子集映射到優化的子加速器上，我們實現了計算能力和神經網絡模型之間的高度整合。我們的架構在Resnet50中相較於最先進的加速器，平均減少了32％的PE使用量和24％的能源消耗。對於像UNet這樣的影像分割模型，相較於最先進的加速器，我們提供了49％的PE使用量減少和39％的能源消耗減少。

關鍵字

模組化架構；特定任務加速器；神經網路加速器；可調整資料流；網路分割

並列摘要

AI is widely used in various domains. To handle the complexity of models with billions of parameters and rapidly evolving architectures, NN models and computational power need to be highly integrated. While general-purpose accelerators used complex networks to adapt to model variations, task-specific accelerators offer better solutions.Through analysis, we found that NN model variations are gradual and predictable. We propose a new architecture that divides NN models into subsets with similar computational characteristics. By mapping these subsets to optimized sub-accelerators, we achieve a high level of integration between computational power and NN models.Our architecture reduces PE usage by an average of 32% and energy costs by 24% compared to state-of-the-art accelerators in Resnet50. For models like UNet, we provide a 49% decrease in PE usage and a 39% decrease in energy costs compared to state-of-the-art accelerators.

並列關鍵字

IP-Base Architecture ； Task-specific accelerator ； DNN accelerator ； Adjustable dataflow ； Network partition

參考文獻

KWON, Hyoukjun, et al. Heterogeneous dataflow accelerators for multi-DNN workloads. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021. p. 71-83.

Google Scholar

CHEN, Yu-Hsin, et al. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9.2: 292-308.

Google Scholar

QIN, Eric, et al. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020. p. 58-70.

Google Scholar

KWON, Hyoukjun; SAMAJDAR, Ananda; KRISHNA, Tushar. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices, 2018, 53.2: 461-475.

Google Scholar

HE, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770-778.

Google Scholar

國際替代計量

適用於特定神經網路任務之可調節映射平行度的模組化加速器

主題瀏覽