多核心系統晶片整合三維堆疊可重組靜態存取記憶體之記憶體資源分配及資料擺置合成演算法

利用穿矽通孔(Through-Silicon Via, TSV)技術在三維方向堆疊處理器及記憶體已被認定為一有前途可解決多核心系統記憶體頻寬不足之技術。近年來，一多核心系統晶片整合三維堆疊區塊式可重組靜態存取記憶體(SRAM)架構已被提出，該架構將記憶體區塊堆疊於處理器上方，透過可重組之片上網路，根據系統動態行為，動態重新組織;然而，動態重組應正確的根據系統需求執行，以使記憶體系統效能得以做最大發揮，並縮小重組代價。因此，在這篇論文中，我們提出一可自動化決定記憶體區塊劃分之合成演算法，為發揮最大記憶體系統效能，除了記憶體區塊分配，合成演算法亦決定如何擺置資料以提升常被使用之資料區域性(locality)。該演算法中同時考量系統之執行情境(多核心系統晶片中會同時執行的一組特定應用程式組合)行為，及三維堆疊可重組靜態存取記憶體架構特點，除了考量單一執行情境的資料存取行為，該演算法也考量對於跨所有場景的需求來對每一場景做資料擺置，以避免不必要的片外記憶體存取。由於上述的考慮層面，實驗數據顯示，針對每一執行情境，我們的演算法平均可減少百分之九點三平均資料存取時間，針對跨執行情境，我們的演算法平均可減少百分之五點一執行時間。

關鍵字

三維堆疊架構；三維堆疊可重組靜態存取記憶體；資料擺置；記憶體配置；系統效能；合成演算法

並列摘要

Utilizing the Through-Silicon Via (TSV) technology to stack processors and memories in the third dimension has been considered as one of the most promising method to alleviate the memory bandwidth problem of a multi-core system. Recently, a tile-based reconfigurable SRAM structure has been proposed for Multi-Processor System-on-Chips (MPSoCs) with 3D die-stacked architecture. The SRAM tiles are stacked on top of the processors or IP cores. Through the reconfigurable on-chip network, the SRAM tiles can be dynamically reconfigured according to the dynamic behavior of the target system. However, the reconfiguration should be correctly performed according to the needs of the target system so that memory system performance can be maximized, and the reconfiguration overheads can be minimized. Therefore, in this thesis, we propose a synthesis algorithm that automatically decides the partition of stacked SRAM tiles to cores. To maximize the memory system performance, in addition to SRAM tile allocation, the algorithm decides how to place data to increase the locality of data that are frequently accessed. Our algorithm considers the workload behavior and the hardware feature of the reconfigurable 3D-stacked SRAMs. For the workload behavior, in addition to the data access behavior within a scenario (a specific set of applications executes concurrently in the MPSoC), the algorithm also considers the requirements among all scenarios to perform data placement for each scenario so that unnecessary off-chip memory accesses can be avoided. Since the above phases of considerations, the experimental results show that the algorithm gets 9.3% reduction of average data access latency within a scenario averagely and 5.1% reduction of execution time across scenarios averagely.

並列關鍵字

3D die-stacked ； reconfigurable 3D-stacked SRAM ； data allocation ； memory resource ； system performance ； synthesis algorithm

參考文獻

[1] A. Schranzhofer, J.-J. Chen, and L. Thiele. Dynamic power-aware mapping of applications onto heterogeneous mpsoc platforms. IEEE Transactions on Industrial Informatics, 6(4):692–707, 2010.

Google Scholar

[2] ARM. Processors. http://www.arm.com/zh/products/processors/cortex-a/cortex-a9.php

Google Scholar

[3] B. Egger, J. Lee, and H. Shin. Scratchpad memory management in a multitasking environment. In Proc. EMSOFT ’08, pages 265–274, 2008.

Google Scholar

[4] B. H. Meyer and D. E. Thomas. Simultaneous synthesis of buses, data mapping and memory allocation for MPSoC. In Proc. CODES+ISSS ’07, pages 3–8, 2007.

Google Scholar

[5] C. C. Liu, I. Ganusov, M. Burtscher, and S. Tiwari. Bridging the Processor-Memory Performance Gap with 3D IC Technology. IEEE Design and Test of Computers, 22(6):556–564, November–December, 2005.

Google Scholar

國際替代計量

多核心系統晶片整合三維堆疊可重組靜態存取記憶體之記憶體資源分配及資料擺置合成演算法

全文下載

主題瀏覽