Architecture Exploration for Power, Yield, and Reliability in VLSI Designs

Modern integration technologies including system in package (SIP) and through silicon via (TSV) make it possible to put more devices in a single IC design. More powerful and complex systems can be developed within small areas. In addition to the advantages brought by these technologies, new challenges are revealed. Increasing the number of devices in a given area often results in higher power density. This causes thermal problems which can severely degrade the reliability of modern electronic systems. Low power architecture design and thermal management scheme are required. Also, more complex fabrication processes are required for modern integration technologies. How to improve the yield is a critical design issue for IC industry. In this dissertation, architecture designs and optimization technologies are proposed to improve power, reliability, and yield in cache, memory, and interconnect levels. First, in cache level, a low power cache designed called run-time reconfigurable expandable cache is introduced. Expandable cache proposed by G. Bournoutian and Orailoglu is very efficient in reducing miss rate and energy consumption with small area overhead. However, the original expandable cache with only one expansion scheme may lead to thrashing problems. In this work, based on the structure of expandable cache, we will introduce a new cache design which has many expansion schemes to fit different run-time program behaviors. The expansion scheme of our proposed cache is dynamically changed by executing configuration instructions which are inserted at compile time. The experimental results of SPEC CPU2000 have shown that our proposed cache design effectively improves the miss rate by 14.74% as compared with the original expandable cache. In terms of energy improvement ratio, our method is 5.62% higher than that of expandable cache when the baseline is set as the energy consumption of 2-way set-associative cache. Second, a thermal-aware memory mapping technique for 3D designs is proposed. DRAM is usually used as main memory for program execution. The thermal behavior of a memory block in a 3D design is affected not only by the power behavior but also the heat dissipating ability of that block. The power behavior of a block is related to the applications run on the system while the heat dissipating ability is determined by the number of tier and the position the block locates. Therefore, a thermal-aware memory allocator should consider the following two points. First, allocator should consider not only the power behavior of a logic block but also the physical location during memory mapping, second, the changing temperature of a physical block during execution of programs. In this thesis, we will propose a memory mapping algorithm taking into consideration the above-mentioned two points. Our technique can be classified as static thermal management to be applied to embedded software designs. Experiments show that, for singlecore systems, our method can reduce the temperature of memory system by 17.1℃ as compared to a straightforward mapping in the best case, and 13.3℃ in average. For systems with 4 cores, the temperature reductions are 9.9℃ and 11.6℃ in average when L1 cache of each core is set to 4KB and 8KB, respectively. Finally, the recovery of failed TSV is discussed. TSV provides communication links for dies in vertical direction and is a critical design issue in 3D integration. Just like other components, the fabrication and bonding of TSVs can fail. A failed TSV can severely increase the cost and decrease the yield as the number of dies to be stacked increases. A redundant TSV architecture with reasonable cost is proposed in this thesis. Based on probabilistic models, some interesting findings are reported. First, the number of failed TSVs in a tier is usually less than 2 when the number of TSVs in a tier is less than 1000 and less than 5 when the number of TSVs in a tier is less than 10000. Assuming that there are at most 2~5 failed TSVs in a tier. With one redundant TSV allocated to one TSV block, our proposed structure leads to 90% and 95% recovery rates for TSV blocks of size 50 and 25, respectively. Finally, analysis on overall yield shows that the proposed design can successfully recover most of the failed chips and increase the yield of TSV to 99.4%.

關鍵字

功率；良率；可靠度；三維晶片；架構；快取；超大型積體電路；熱能；矽穿孔

並列摘要

隨著半導體製程技術的發展，先進的電子整合科技如系統級封裝技術 (System in Package)及直通矽穿孔技術(Through Silicon Via)，讓設計者可以將更多的電子元件整合在單一設計中，並在極小的單位面積下，開發出在功能愈來愈強，複雜度愈來愈高的電子系統。這些先進的技術雖然可以帶來許多好處，卻也在半導體的設計及製造上，帶來了新的挑戰。在單位面積下，提升電子元件數量的同時，也會增加功耗密度(Power Density)，這將會導致熱能(Thermal)的問題，大幅影響現今電子系統的可靠度。因此，低功率的系統架構設計以及熱能管理機制在未來的電子系統中將是不可或缺的。此外，先進的整合科技中，需要用到較為複雜的製程技術，如何在使用這些技術的同時，維持高良率(Yield)，對於電子產業來說是非常重要的。在本論文中，針對快取(Cache)、主記憶體(Main Memory)，以及三維導線連結(3D Interconnect)等三個層級，我們提出了新的系統架構以及優化方法，以提升功率、可靠度，以及良率方面的表現。首先，在快取記憶體系統的部份，我們開發了一個可以在動態下作調整 (Run-Time Reconfigurable)的延展式快取架構(Expandable Cache)。由G. Bournoutian及Orailoglu所提出的延展式快取架構，可以在很小的面積成本下，有效的降低嵌入式系統(Embedded System)的快取誤失(Cache Miss)及功能消耗(Energy Consumption)。然而，原先的延展式快取架構，只使用了一種固定的延展方式，會產生嚴重的置換(Thrashing)問題。針對這個現象，我們以原先的延展式快取架構為基礎，開發出一個可以針對程式的動態執行特性，去變更延展方式的快取架構。透過執行在編譯(Compile)階段所加入的特殊組態指令(Configuration Instructions)，延展式快取架構的延展方式可以在動態下作變更。基於SPEC 2000的實驗結果，我們所提出的新快取架構，與原來的延展式快取架構相比，在快取誤失的比例上，有14.74%的改善。在功能消耗的表現上，以雙路集合關聯(2-Way Set-Associative)快取架構為基準，我們所提出的架構比原先的延展式快取架構要好上5.62%。接下來，針對三維設計下的主記憶體系統，我們提出了一個考量熱能表現的記憶體映程(Memory Mapping)技術。在三維設計中，一個硬體記憶體區塊的熱能表現，取決於該區塊的功耗表現(Power Behavior)以及散熱能力(Heat Dissipating Ability)。功耗表現的部分，主要由所執行程式的存取特性來決定；而散熱能力，則要由記憶體區塊所在的實體位置所決定。故，一個考量熱能表現的記憶體配置(Memory Allocator)系統，必須有著下列兩項特性。第一，該配置系統要能同時考量一個記憶體區塊的功耗表現以及散熱能力；第二，該配置系統要能考量到一個硬體記憶體區塊，在應用程式執行過程中的溫度變化。在本論文中，我們將會提出一個同時考量上述兩項特性的記憶體映成演算法。我們所提出的方法屬於靜態(Static)的熱能管理機制，主要運用於嵌入式系統軟體端的設計流程。實驗結果顯示，對於單核心的系統，我們的方法在最好的情形下，與傳統直接的記憶體映成方式相比，可將記憶體系統的最高溫度降低17.1℃。平均來說，記憶體系統的最高溫度降低了13.3℃。對於四核心的系統，當L1快取分別設為4KB與8K時，記憶體系統的最高溫度分別降低了9.9℃與11.6℃。最後，針對直通矽穿孔導線的部分，我們提出了一套修復的機制。在三維設計中，直通矽穿孔導線，主要用於在垂直方向的訊號連接，對於三維晶片來說非常重要。但就像一般的半導體元件，直通矽穿孔導線的製造及連結，可能會因為製程上的問題而無法正常運作。當利用三維整合技術所堆疊的晶圓粒(die)愈來愈多時，因製程因素所造成的直通矽穿孔導線失效問題，將會嚴重的影響生產良率及製造成本。針對直通矽穿孔導線失效問題，本論文提出了一個合乎成本考量的修復架構。透過機率模型的分析，我們整理了一些重要的觀察結果。首先，當層與層(Tier-to-Tier)之間的直通矽穿孔導線數量少於1000及10000時，實際上層與層之間會失效的直通矽穿孔導線數量通常不大於2及5。假設層與層之間失效的直通矽穿孔導線數量在2到5之間，足以涵蓋99.9%所有的情形(包含有失效直通矽穿孔導線以及沒有的狀況)。在這樣的假設下，當每個直通矽穿孔導線區塊(TSV Block)各配置有一個冗於直通矽穿孔導線時，若將直通矽穿孔導線區塊所包含的導線數量限制在50及25，可以達到90%及95%的修復率(對於有直通矽穿孔導線失效的情形來說)。就整體的良率來說，我們提出的修復架構，足以將絕大部分因直通矽穿孔導線瑕疵而失效的晶片加以修復，並將直通矽穿孔導線的生產良率提升到99.4%。

並列關鍵字

Power ； Yield ； Reliability ； 3D IC ； Architecture ； Cache ； VLSI ； Thermal ； Through Silicon Via

參考文獻

[2] G. Bournoutian and A. Orailoglu,“Miss Reduction in Embedded Processors Through Dynamic, Power-Friendly Cache Design” in Proc. Design Automation Conference (DAC), June, 2008, pp. 304-309.

[3] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, “Scratchpad Memory: A Design Alternative for Cache On-Chip Memory in Embedded Systems,” in Proc. 10th International Symposium on Hardware/Software Codesign, New York, 2002.

[4] J. Kin, M. Gupta, and W. H. Mangione-Smith, “The Filter Cache: An Energy efficient memory structure,” in Proc. 30th International Symposiun on Microarchitecture, 1997, pp. 184-193.

[5] A. Janapsatya, S. Parameswaran, and A. Ignjatovic, “Hardware/Software Managed Scratchpad Memory for Embedded System,” in Proc. International Conference on Computer-Aided Design (ICCAD), 2004.

[6] C. T. Wu, A.-C. Hsieh, and TingTing Hwang, “Instruction Buffering for Nested Loops in Low-Power Design,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 14, No.7, July 2006, pp. 780-784.

國際替代計量

Architecture Exploration for Power, Yield, and Reliability in VLSI Designs

全文下載

主題瀏覽