透過您的圖書館登入
IP:18.222.240.21
  • 學位論文

超大型積體電路設計下,針對效能、功率、散熱之快取記憶體架構探索及設計最佳化

Exploration of cache architecture and design optimization for performance, power, thermal issues in VLSI technology

指導教授 : 黃婷婷

摘要


隨著半導體製程進步的發展,單位晶片面積下能放置的電晶體數 量也隨之增加,因此電子元件在設計時能更加複雜且功能越強。除此 之外,目前先進的整合封裝技術如系統級封裝技術(System in Package)及直通矽穿孔技術(Trough Silicon Via),能將許多不同功 能的元件整合在同一塊晶片上,使系統及電路的效能更進一步地提升。然而這些先進的技術雖然帶來許多好處,卻也帶來了新的挑戰。當單位晶片面積上,放置越多電子元件時,電路功耗密度(Power Density)也隨之增加。這種現象會造成晶片散熱的問題,並降低系統電路的可靠度(Reliability)及執行效能。除此之外,功能越強大的電子元件也代表著需要更多的電源提供,為了達到低功率且高效能的系統,如何有效地管理系統內的電子元件運作是相當重要的問題。在本論文中,針對快取記憶體架構和電路設計這兩個層級,我們提出了新的架構、管理方法以及優化技術,以提升系統效能、功耗,以及電路散熱的表現。 首先,在實體電路設計最佳化的部分,我們將探討有關三維晶片 的散熱問題。在先前文獻中,Chen 所提出之堆疊式直通矽穿孔通道 (Stacked Trough Silicon Via)技術能夠有效地改善三維晶片的導熱 效益,然而此架構只有應用在電源供應網路。因此在此研究中,我們 將利用堆疊式直通矽穿孔通道(Stacked Trough Silicon Via)架構, 在不增加過多之繞線線長的條件下,降低電路在執行時的溫度。我們 開發了一個三階段擺置演算法,能於三維晶片(3D IC)繞線階段時重 新擺放並堆疊直通矽穿孔通道(Trough Silicon Via)。 接下來針對快取記憶體系統的部分,我們開發了一個考量執行緒 重要性(Thread Criticality)動態調整快取記憶體組態的方法。先前 文獻中,Zhang 提出可重組式快取記憶體架構(Reconfigurable Cache)改善系統執行效能和功率消耗,然而先前的研究只應用在單核心系統架構。因此我們會在平行程式於多核心系統執行時預測每個執行緒的效能重要性,並根據此資訊調整多核心系統內的快取記憶體組態。 最後,針對壓縮快取記憶體(Compressed Cache)架構的部分,我 們提出了免簡縮(Compaction-free)式壓縮快取記憶體架構,用以實 現高效能多核心系統。壓縮快取記憶體架構通常應用在最底層之快取 記憶體階級(Last Level Cache),透過壓縮存放資料的大小,快取記 憶體能夠存放資料數量便隨之增加。然而由於壓縮資料大小不一致的 關係,因此在此快取記憶體架構中會發生存取空間破碎之現象 (Storage Fragmentation)。當發生此現象時,簡縮機制(Compaction Process)便會啟動並重整資料的存放位址,製造足夠的連續空間存放 壓縮資料。然而執行簡縮機制會需要許多額外執行時間,影響壓縮快 取記憶體的執行效益。因此在此研究中, 我們將設計免簡縮 (Compaction-free)式壓縮快取記憶體架構,消除所有減縮機制所需 要的效能負擔。

並列摘要


Due to the advanced VLSI technology process, the transistor count in a single IC continues to grow so that more complex and powerful devices can be manufactured within small areas. Furthermore, the modern integration technology such as system in package (SIP) and through silicon via (TSV) provides a good ability to integrate heterogeneous devices within the same chip. Based on these technologies, the circuit and system performance can be greatly improved. However, these technologies also bring some challenges. Since more and more devices (transistors) are placed in a given area, the power density of chip is also increased. This effect severely results the thermal problem which can degrade the system reliability and performance. Moreover, increasing the number of device in a single IC needs the additional power budget. To alleviate the power wall, the device management is required to achieve high performance and low energy consumption system. In this dissertation, the exploration of cache architecture and design optimization techniques are proposed to improve system performance, power, and thermal issues in system and physical design levels. First, in physical design level, a study of stacked signal TSV for thermal dissipation in global routing for 3D IC is introduced. Stacked TSV structure proposed by Chen et al. is very efficiency in dissipating the heat flow for 3D. However, the original stacked TSV structure is only used in power network. In this work, we leverage the integrated architecture of stacked signal TSV to minimize temperature with small wiring overhead. Based on the structure of stacked signal TSV, a three-stage TSV locating algorithm in global routing is designed. Second, in system design level, a study of thread-criticality aware dynamic cache reconfiguration for multi-core system is proposed. Reconfigurable cache proposed by Zhang et al. can improve system performance and energy consumption. However, the original reconfigurable cache only used in single core system. In this work, we dynamically predict thread criticality of a parallel application and tune our cache memory architecture accordingly in multi-core system. Finally, a study of compaction-free compressed cache for high performance multi-core system is introduced. Compressed cache is usually used in last level cache to increase the effective capacity. However, because of various data compression sizes, fragmentation problem of storage is inevitable in this cache design. When it happens, usually, a compaction process is invoked to make contiguous storage space. This compaction process induces extra cycle penalty and degrades the effectiveness of compressed cache design. In this work, we propose a compaction-free compressed cache architecture which can completely eliminate the time for executing compaction.

參考文獻


[2] Y.-T. Chen, J. Cong, H. Huang, B. Liu, C. Liu, M. Potkonjak, and G. Reinman, “Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design," in Proc. Design, Automation, and Test in Europe (DATE), 2012, pp. 45-50.
[4] S. Sardashti and D. A. Wood, “Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching," in Proc. Microarchitecture (MICRO), 2013, pp. 62-73.
[6] M. Pathak and S. K. Lim, “Performance and thermal-aware steiner routing for 3-D
stacked ICs," IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 28, no. 9, pp. 1373-1386, 2009.
[7] A. R. Alameldeen and D. A.Wood, “Adaptive cache compression for high-performance processors," in Proc. International Symposium on Computer Architecture (ISCA), 2004, pp. 212-223.

延伸閱讀