近期有許多研究針對用於卷積神經網絡(CNN)的加速器,以得到比目前范紐曼(Von-Neumann)架構更佳的性能。隨著計算能力的不斷增長且記憶體速度的增長相對較慢,運算單元和記憶體設備之間的資料搬移已成為限制這些 加速器性能的瓶頸。為了減少資料的搬移,Processing-In-Memory(PIM)架構被許多論文研究並提倡。然而,PIM 架構只能減少晶片外與晶片之間資料的搬移。晶片內部暫存空間與運算單元的搬移仍然存在,為了進一步減少該區段資料的搬移,此篇論文提出了一種卷積核(Kernel)展開方法,用晶片計算能力換取輸入特徵圖片(Input Feature Map)資料的利用效率,以減少輸入資料的搬移量,而不會重複輸入相同的資料。此篇論文提出的方法分別在時脈週期的減少、執行時間降低,以及輸入資料量減少的方面,分別得到了 16.2 倍,1.62倍和19.2 倍的改進程度。
A number of recent woks aimed to design accelerators for Convolutional Neural Networks (CNNs) to improve the performance of current Von-Neumann architecture. As computation power keeps growing and memory speed grows relatively slow, data movement between computing units and memory devices has become bottleneck that limit the performance of these accelerators. To eliminate the data movement, Processing-In-Memory (PIM) architecture is widely advocated. However, PIM can only decrease the data movement from off-chip. To further decrease the on-chip data movement, we propose a kernel-unfolding approach to trade off computation power for lower input data movement amount by fully utilize the input feature map data without overlapped data be input. The proposed approach yields improvements of 16.2×, 1.62× and 19.2× in cycle used, execution time, input data amount, respectively.