透過您的圖書館登入
IP:174.129.93.231
  • 學位論文

利用調適性集合組態架構減少單晶片多處理器變相化末級快取之衝突失誤

Leveraging Adaptive Set-Configurable Architecture to Reduce Conflict Misses of PCM Last-Level Cache in CMP

指導教授 : 陳添福

摘要


隨著處理器晶片中運算核心的數目增加,末級快取在運算核心與主要記憶體之間的延遲斷層中扮演了很重要的角色。運算核心的數目依摩爾定律成長,而晶片外部主要記憶體頻寬的成長幅度不如核心數目成長的速率,因此頻寬限制已成為阻礙大量核心規模成長的瓶頸。除此之外,階層式快取架構消耗的功率亦佔了整顆處理器總功耗的一大部分。對大量核心系統的末級快取而言,非揮發性記憶體技術擁有幾項優點,例如高儲存密度、低漏電流及其非揮發性。此論文中,我們研究變相化記憶體,它是目前新興非揮發性隨機存取記憶體技術中前景最被看好的技術之一,並將其應用於大量核心架構的末級快取。 我們考慮到把儲存密度較靜態隨機存取記憶體高的變相化記憶體當作末級快取的資料矩陣會遭遇的問題。首先,利用較密集的記憶體技術帶來更多的儲存欄位,同時也會產生大量的標籤矩陣代價。再者,較大的快取容量會解決部分由位址對照產生的衝突性失誤,但非完全解決。第三,變向化記憶體的使用壽命比靜態以及動態隨機存取記憶體都短很多,所以耐久性的議題更為重要。本論文提出“調適性集合組態架構”,藉由觀察各集合的狀況並動態地聯合或分開集合中的欄位,以降低末級快取的衝突性失誤。

並列摘要


As the number of cores in a chip is increased, last-level caches (LLC) play an important role in latency gap between processor cores and main memory. Bandwidth limitation becomes a bottleneck to limit many-core scaling, since off-chip memory bandwidth grows slowly contrasted with growth in the number of cores according to Moore’s low. Besides, the cache hierarchy consumes a large portion of the total power of processors. Non-volatile memory (NVM) technologies offer several advantages for LLCs in many-core systems, in terms of high density, low leakage and non-volatility. In this work, we study phase change memory (PCM), one of the most promising technology among emerging non-volatile random access memory technologies, as replacements for existing on-chip LLCs for many-core architectures. We consider the following problems of utilizing PCM which is denser than SRAM as LLC data array. First, use denser memory technology to bring more entries may also bring a very large overhead of tag array. Second, more cache capacity may address portion but not all conflict miss caused by address mapping. Third, the lifetime of PCM cell is much shorter than SRAM and eDRAM cell, so endurance issue is serious. And then this work propose Adaptive Set-Configurable Architecture and Run-time Set-Configuration Decision to reduce LLC conflict misses by observing the states of each cache set and adaptively unite or divide the ways in the set.

參考文獻


[1] N. L. Binkert, et al., “The M5 Simulator: Modeling Networked Systems.” Micro, IEEE, vol. 26, pp. 52-60, 2006.
[3] J. Chang and G. S. Sohi, “Cooperative cache partitioning for chip multiprocessors.” International Conference on Supercomputing, pp.242-252 , 2007.
[4] A. S. Dhodapkar and J. E. Smith, “Comparing Program Phase Detection Techniques.” MICRO, 2003.
[5] X. Dong, et al., “PCRAMsim: System-Level Performance, Energy, and Area Modeling for Phase-Change Ram.” ICCAD, pp. 269-275, 2009.
[6] R. F. Freitas, and W. W. Wilcke. “Storage-Class Memory: The next Storage System Technology.” IBM Journal of Research and Development, vol. 52, pp. 439-447, 2008.

延伸閱讀