應用於多位元卷積神經網路以基於高輸入精度計算單元6T靜態隨機存取記憶體之記憶體內運算架構

隨著人工智慧和卷積神經網路的快速發展，對相關的硬體設備需求隨之提高，在傳統范紐曼(Von Neumann)架構下，大量的數據在記憶體和運算單元中傳輸，會消耗大量的功率，稱之為范紐曼瓶頸(Von Neumann bottle neck)。為了解決范紐曼瓶頸，記憶體內運算電路是具有潛力的一個選項，記憶體內運算電路的設計目標即是讓所有運算在記憶體內完成，藉由減少數據傳輸所耗損的功率。記憶體內運算電路是記憶體同時也是運算單元，數據儲存在記憶體內同時進行運算，減少大量的數據運輸，也具有較高的平行度(parallelism)。為了達到上述目的，本研究將神經網路特徵存入記憶體並激活，使記憶體內運算電路進行乘法和累加(Multiply and Accumulation, MAC)的功能。本研究提出一使用6T SRAM進行多位元MAC功能的SRAM-CIM巨集，巨集電路使用(1)高輸入精度運算單元(6T High Input Precision Computing Cell)，具平行處理8位元輸入和1位元權重運算功能，並提供較緊湊的面積，(2) 全域位元線 (GBL) 結合電路，減少巨集電路所需的感測電路，以達到巨集電路的較佳能源效率。製造的28nm 384Kb靜態隨機存取記憶體記憶體內運算巨集實現最高8位元輸入、8位元權重的MAC操作，擁有20位元的輸出精度，並達到運算時間3.8ns和能源效率14.97TOPS/W。

關鍵字

記憶體內運算；靜態隨機存取記憶體；記憶體；卷積神經網路

並列摘要

As the rapid development of Artificial Intelligence and convolutional neural network, the demand for related hardware equipment has increased. However, in the Von Neumann structure, the huge amount of data movement between memory and computing unit consumes large power consumption. It is called Von Neumann bottleneck. In order to solve Von Neumann bottleneck, Computing-in-Memory (CIM) is a potential option, and the goal of Computing-in-Memory is that memory can both compute and store data. Hence, in Computing-in-Memory architecture, there is no computing unit and it has higher parallelism. The data movement between memory and computing unit can be significantly reduced, and the power consumption can be significantly improved. In order to achieve the above objectives, the idea of this work is to feed the neural network feature map into the memory array and activate it in parallel, so the memory can perform the functions of multiplication and accumulation (MAC). This work proposes a SRAM-CIM macro based on 6T compact SRAM cell to perform multi-bit MAC operation. This macro uses (1) High Input Precision Computing Cell (HIPCC), it can perform 8-bit input and 1-bit weight computation with compact array area, (2) GBL combine method, it can reduce the number of used sensing circuit to achieve better energy efficiency. The fabricated 28nm 384Kb SRAM-CIM macro realizes the function of up to 8-bit input, 7-bit weight, 16 channel accumulation, and 20-bit output precision MAC operation, and achieves 3.8ns and 14.97TOPS/W.

並列關鍵字

Computing-in-Memory ； SRAM ； Memory ； Convolutional Neural Network

參考文獻

[1] H. Qin, et al., “SRAM leakage suppression by minimizing standby supply voltage,” in IEEE International Symposium on Quality Electronic Design, pp. 55-60, 2004.

Google Scholar

[2] K. Nii, et al., “A Low Power SRAM using Auto-Backgate-Controlled MT-CMOS,”in IEEE International Symposium on Low Power Electronics and Design, pp. 293-298, Aug. 1998.

Google Scholar

[3] C. Morishima, et al., “A 1-V 20-ns 512-Kbit MT-CMOS SRAM with Auto-Power-Cut Scheme Using Dummy Memory Cells,”in IEEE European Solid-State Circuit Conference, pp. 452-455, Sept. 1998.

Google Scholar

[4] A. G. Hanlon et al., “Content-Addressable and Associative Memory Systems a Survey,” IEEE Transactions on Electronic Computers, vol. EC-15, no.4, pp.509-521, Aug. 1966.

Google Scholar

[5] C. C. Wang et al., “An Adaptively Dividable Dual-Port BiTCAM for Virus-Detection Processors in Mobile Devices,” IEEE International Solid-State Circuits Conference, pp.390-622, Feb. 2008.

Google Scholar

國際替代計量

應用於多位元卷積神經網路以基於高輸入精度計算單元6T靜態隨機存取記憶體之記憶體內運算架構

查找全文

主題瀏覽