透過您的圖書館登入
IP:18.117.192.242
  • 學位論文

低功耗低延遲的動態隨機存取記憶體控制器設計

Controller Design for a Low Power, Low Latency DRAM with Built-in Cache

指導教授 : 吳誠文
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


由於“記憶體牆”的存在,在今天的計算機系統中,記憶體的效能表現仍然是重要的瓶頸之一。在降低記憶體的功耗和延遲的同時,DRAM製造商也希望在生產DRAM的時候保持著產品低廉成本的特點。因此近些年來新型架構的DRAM被提出來以解決上述這些問題,這些使用了非對稱型位線結構的DRAM叫做分層延遲式DRAM。我們實驗使用的記憶體和這類DRAM結構很相似,但是我們把記憶體正列裡面的小陣列那一塊當作快取記憶體來使用,並且取了一個新名字叫做內建快取記憶體式DRAM.而且為了最大利用這種記憶體架構我們提出了合適的演算法。 在提出的DRAM控制器的設計中,由於資料會經常通過比較小的陣列被存取,如果小陣列中並沒有需求的資料,那麼資料就會從比較大的陣列中被搬運到小陣列中來。當小陣列中的資料已經存滿或者DRAM中新地址需要訪問時,我們需要決定好適當地演算法來在小陣列內清除一個存放資料的存儲單位。這些清除小陣列資料的演算法包括“先進先出”,“最少使用先出”,和“最早使用先出”三種。 後來我使用了已經修改過後的DRAMsSim2這種精確到週期數的記憶體系統模擬器,來結合控制器和DRAM模型進行驗證。由於模擬時基礎的規格使用的3D WIDE IO DRAM的標準,我們從整個記憶體系統的實驗結果來看內建式快取記憶體式DRAM的延遲和功耗確實比一般的WIDE IO DRAM更低。與此同時我們也做了一些大小陣列不同比例、地址訪問規則不同、和快取記憶體不同路數目下的關聯情況研究,來觀察實驗並且進行相互比較。

並列摘要


Memory system’s performance is still a significant bottleneck in today’s computer system due to the memory wall issue. In order to reduce the power and latency, DRAM vendors also hope that they can keep the characteristic of low production cost at the same time. Therefore, a new type of DRAM with asymmetric bitlines, called Tiered Latency DRAM (TL-DRAM), was put forward recently. Our target DRAM is similar to TL-DRAM, but we operate the small array like a cache, so we rename the DRAM as the Built-in Cache DRAM (BC-DRAM). In this work we propose a controller design with appropriate algorithms to get the most out of the BC-DRAM. In the proposed DRAM controller design, the data should be accessed from the small array as often as possible. If the small array does not contain the requested data, the requested data should be migrated from the large array to the small array. When the cache for small array is full and a new row address is requested to store in the cache, we need to determine the victim to be replaced. In this thesis, three replacement policies are integrated in the controller design to determine the victim, i.e., first-in-first-out (FIFO), least-used-first-out (LUFO), and earliest-used-first-out (EULO). We have modified the DRAMSim2, a cycle accurate memory system simulator, to test the algorithms of our controller together with a DRAM model. Based on the Wide-IO 3D DRAM specifications, our experimental results show that BC-DRAM with the proposed controller will consume lower power and achieve lower latency than the typical DRAM. Experiments are also done to show the effects of different specifications, such as sizes of small and large arrays, address scrambling rules, and number of ways of set association.

參考文獻


[1] A. R. Biswas and R. Giaffreda, “IoT and Cloud Convergence: Opportunities and Challenges,” in Proc. IEEE World Forum on Internet of Things, pp. 375–376, March. 2014.
[4] M. V. Wilkes, “The Memory Gap and the Future of High Performance Memories,” ACM SIGARCH Computer Architecture News, vol. 29, pp. 2–7, March. 2001.
[6] T. Vogelsang, “Understanding the Energy Consumption of Dynamic Random Access Memories,” in Proc. Microarchitecture, pp. 363–374, Dec. 2010.
[10] C. Toal, D. Burns, K. McLaughlin, S. Sezer, and S. O'Kane, “An RLDRAM II Implementation of a 10Gbps Shared Packet Buffer for Network Processing,” in Proc. Adaptive Hardware and Systems, pp. 613–618, Aug. 2007.
[11] ITRS. International Technology Roadmap for Semiconductors: Process Integration, Devices, and Structures. http://www.itrs.net/Links/2007ITRS/Home2007.htm, 2007.

延伸閱讀