透過您的圖書館登入
IP:18.222.117.109
  • 學位論文

基於交錯可變電阻式記憶體操作單位量及功耗優化之稀疏圖重映射演算法

SGIRR: Sparse Graph Index Remapping for ReRAM Crossbar Operation Unit and Power Optimization

指導教授 : 張耀文
共同指導教授 : 張原豪(Yuan-Hao Chang)

摘要


可變電阻式記憶體(ReRAM)是一種級具有前景的記憶體內處理技術(Process-In-Memory),可有效地降低在巨型複雜圖形處理中,運算單元和內存單元之間的資料移動(Data Movement)成本,ReRAM 單元可以與交叉開關陣列(Crossbar Array)相結合,加速優化圖形處理,並將 ReRAM 交叉開關陣列劃分為操作單元(Operation Unit, OU)可以進一步提升 ReRAM 交叉開關的計算精確度,以往的設計中沒有特別考慮優化操作單元的利用率,導致產生了額外的運算成本和能量損失。 為了彌補這些缺點,在本篇論文中,我們提出了一種兩個階段的演算法,並且以交叉開關上的操作單元作為優化目標方案,來重新映射稀疏圖(Sparse Graph)的行列順序,以聚集稀疏圖上的有效資料,減緩圖稀疏性(Sparsity)所生成過多的能耗和運算成本。 在本篇論文中,我們透過重映射索引算法並考量給定操作單元的大小,優化操作單元的使用率和能耗。 實驗結果表明,與沒有經過任何優化的結果相比,我們提出的算法平均降低了交叉開關上的操作單元的總使用量65.0%,並提高了操作單元的總使用率36.34%,同時節省了50.4%的能源消耗。另一方面,與現有的演算法相比,實驗結果表明,我們提出的算法平均降低了交叉開關上的操作單元的總使用量31.4%,並提高了操作單元的總使用率10.6%,同時節省了17.2%的能源消耗。

並列摘要


Resistive Random Access Memory (ReRAM) Crossbars are a promising processin-memory (PIM) technology to reduce enormous data movement overheads of largescale graph processing between computation and memory units. ReRAM cells can combine with crossbar arrays to effectively accelerate graph processing, and partitioning ReRAM crossbar arrays into Operation Units (OUs) can further improve computation accuracy of ReRAM crossbars. The operation unit utilization was not optimized in previous work, incurring extra computation cost and energy consumption. In this thesis, we propose a two-stage algorithm with a crossbar OU-aware scheme for sparse graph index remapping for ReRAM (SGIRR) crossbars, mitigating the influence of graph sparsity. In particular, this thesis is the first to consider the given operation unit size with the remapping index algorithm, optimizing the operation unit and power dissipation. Compared with the baseline work, experimental results show that our proposed algorithm reduces the utilization of crossbar OUs by 65.0%, improves the total OU block usage by 36.34%, and saves energy consumption by 50.4%, on average. On the other hand, compared with the previous work, experimental results show that our proposed algorithm reduces the utilization of crossbar OUs by 31.4%, improves the total OU block usage by 10.6%, and saves energy consumption by 17.2%, on average.

參考文獻


[1] E. Agichtein, C. Castillo, D. Donato, and A. Gionis, “Finding high-quality content in social media,” in Proceedings of the International Conference on Web Search and Web Data, pp. 183–194, 2008.
[2] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-Based main memory,” in Proceedings of ACM/IEEE International Symposium on Computer Architecture, pp. 27–39, 2016.
[3] G. Dai, T. Huang, Y. Chi, J. Zhao, G. Sun, Y. Liu, Y. Wang, Y. Xie, and H. Yang, “GraphH: A processing-in-memory architecture for large-scale graph processing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 4, pp. 640–653, 2018.
[4] G. Dai, T. Huang, Y. Wang, H. Yang, and J. Wawrzynek, “GraphSAR: A sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 120–126, 2019.
[5] C. Giannoula, I. Fernandez, J. G. Luna, K. Koziris, G. Goumas, and O. Mutlu, “SparseP: Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures,” in Proceedings of ACM International Conference on Measurement and Analysis of Computing Systems, pp. 1–49, 2022.

延伸閱讀