透過您的圖書館登入
IP:3.133.156.156
  • 學位論文

藉由自適調整區塊大小管理機制以提升快取記憶體的命中率與效能

Improving DRAM Cache Hit Rate and Performance via Adaptive Granularity Block Size Management

指導教授 : 陳添福

摘要


命中率與存取時間是決定快取記憶體執行效率的兩個最關鍵因素。為了提升快取記憶體的效能,該研究界已提出各種固定粒度之下的快取記憶體標籤管理機制。然而,固定粒度記憶體架構的效能提升率有限。固定粒度小之快取記憶體需要佔用更大的標籤存取空間,而命中率不大。雖然固定粒度大的快取記憶體之命中率較高,往往因為取多餘的資料而浪費頻寬。 為了得到大粒度以及小粒度之好處,我門提出了可自適調整粒度之快取記憶體架構以及該架構之官理機制。本快取記憶體設計架構不只省了標籤存取空間,而命中率提升25%。另外,本篇論文提的預取機制提升了32%的命中率。此外,我們的實驗數據表示比起現有的固定粒度快取記憶體架構,本自適粒度快取記憶體的平均效能提升了百分之45,而比起理想的靜態記憶體為標籤存取空間之架構,本架構之效能提升了7%。

關鍵字

快取記憶體 粒度 自適 命中率

並列摘要


Hit rate and access latency are the two most crucial factors that determine the performance of on-chip stacked DRAM. Various fixed granularity DRAM cache tag design management has been proposed to improve the overall stacked DRAM performance. However, the small yet fixed data block size incurs high tag storage and failed to exploit spatial granularity. On the other hand, coarsely grained stacked DRAM offers higher hit rate, at the cost of high bandwidth wastage due to the fetching of unused blocks. We propose an adaptive granularity DRAM cache block management to gain the benefit of both small and coarse granularity stacked DRAM, hence reduce the disadvantages of both designs. Our design not only reduce the tag storage size, but also improve the hit rate over 25%. The bandwidth wastage reduced as well as the decreasing miss rate. We added block prefetching mechanism on top of our design to further optimize the overall system performance. Our experiment result shows that block prefetching achieves 32% higher hit rate compared to the fixed granularity designs. Moreover, we achieve averagely 45% and 7% performance gain improvement in terms of reduced miss penalty over state-of-art fixed granularity DRAM cache and the ideal tags-in-SRAM design, respectively.

並列關鍵字

DRAM Cache Granularity Adaptive Hit Rate

參考文獻


[19]. C.–C. Huang and V. Nagarajan, “ATCache: Reducing DRAM cache Latency via a Small SRAM Tag Cache”, in Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2014.
[16]. D. H. Yoon, M. K. Jeong, and M. Erez, “Adaptive Granularity Memory Systems: A Tradeoff Between Storage Efficiency and Throughput”, in Proceedings of the 38th annual International Symposium on Computer Architecture (ISCA), pages 295-306, 2011.
[1]. G. H. Loh and M. D. Hill, “Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches”, in IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 454-464, 2011.
[2]. G. H. Loh and M. D. Hill, “Supporting Very Large DRAM Caches with Compound Access Scheduling and MissMaps”, in IEEE Micro Magazine, Special Issue on Top Picks in Computer Architecture Conferences, 2012.
[3]. M. K. Qureshi and G. H. Loh, “Fundamental Latency Trade-offs in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design”, in 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 235-246, 2012.

延伸閱讀