透過您的圖書館登入
IP:18.191.181.231
  • 學位論文

低功耗指令快取記憶體與高成本效益的電子束直寫系統之解碼器

Low Power Instruction Cache and Cost-Effective Decoder of Electron-Beam Direct-Write Systems

指導教授 : 陳中平
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


此論文提出了一種節能,低成本的指令快取記憶體查找技術,稱為動態早期標籤查找(DETL)的方法。DETL利用閒置的週期對快取記憶體的索引執行早期的標籤查找。提早獲得配對記憶體的資訊,可以節省用於並行存取其他快取記憶體的動態功耗。我們在RISC-V微結構中的四路集關聯指令快取記憶體上實現 DETL,並使用SPEC CPU2006基準套件測試其性能。我們觀察到動態功耗降低 19.38%,而成本增加不到 0.1%。 數據流量是多電子束直寫(MEBDW)系統中的一項關鍵指標,因此需要高效能的數據處理設備。主要的挑戰是如何通過具有成本效益的技術來實現高性能。此論文提出了一種高壓縮率的數據傳輸演算法和高速解壓縮的硬體實現方法,以提高系統的數據流量。硬體解碼器使用管道體系結構,運用長度編碼先進先出(FIFO)暫存陣列和並行調度邏輯來提高數據流量。該解碼器在FPGA上進行評估,並使用此論文提出的壓縮演算法所壓縮的佈局圖像進行模擬。結果顯示,與以前的方法相比,在類似的硬體成本下,壓縮率提高了 18.2%,數據流量提高了 254.8%。由於在設計中不使用靜態隨機存取記憶體(SRAM),因此可以輕鬆擴展系統的通道數,這使下一代MEBDW系統有可能實現更高的每小時晶圓(WPH)製造目標。

並列摘要


Power consumption is the most important issue of modern processors. Instruction fetch activity cooperating with instruction cache is an energy hot spot. Dynamic power reduction in instruction cache lookup can contribute to the improvement of energy efficiency in processors. An energy-efficient and low area-overhead instruction cache lookup technique called Dynamic Early Tag Lookup (DETL) is proposed. DETL exploits a fetch bubble cycle to perform an early tag lookup for the index of the matching cache set. Therefore, the dynamic energy for parallel accesses of other cache memory banks may be saved. We implement DETL on a four-way set-associative I-cache in a RISC-V micro-architecture, and test its performance using the SPEC CPU2006 benchmark suite. We observed a 19.38% dynamic power reduction with < 0.1% area overhead. Data throughput is a critical metric in a Multiple Electron-Beam Direct-Write (MEBDW) system so that heavy-duty data processing equipment is required. The main challenge is about how to achieve high performance with cost-effective techniques. In this dissertation, we propose a high compression rate algorithm for efficient data transfer and high-speed decompression hardware to raise data throughput of the system. The hardware decoder uses pipeline architecture, a run-length encoding First-In-First-Out (FIFO) queue, and parallel dispatch logic to increase the throughput. The decoder is evaluated on Field-Programmable Gate Array (FPGA) and simulated with layout images that are compressed using our proposed compression software. The results demonstrate 18.2% better compression rate and 254.8% better throughput than the previous work with similar hardware cost. Because no Static Random-Access Memory (SRAM) is used in the design, the channel number of the system can be easily scaled up, which makes it possible for the next-generation MEBDW system to achieve higher Wafer Per Hour (WPH) targets.

參考文獻


[1] W. Tang, A. Veidenbaum, A. Nicolau, and R. Gupta, “Integrated i-cache way predictor and branch target buffer to reduce energy consumption,” in International Symposium on High Performance Computing, pp. 120–132, Springer, 2002.
[2] M. D. Powell, A. Agarwal, T. Vijaykumar, B. Falsafi, and K. Roy, “Reducing set-associative cache energy via way-prediction and selective direct-mapping,” in Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pp. 54–65, IEEE Computer Society, 2001.
[3] K. Inoue, T. Ishihara, and K. Murakami, “Way-predicting set-associative cache for high performance and low energy consumption,” in Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No. 99TH8477), pp. 273–275, IEEE, 1999.
[4] C.-L. Yang and C.-H. Lee, “Hotspot cache: joint temporal and spatial locality exploitation for i-cache energy reduction,” in Proceedings of the 2004 international symposium on Low power electronics and design, pp. 114–119, ACM, 2004.
[5] A. Sembrant, E. Hagersten, and D. Black-Shaffer, “Tlc: A tag-less cache for reducing dynamic first level cache energy,” in 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 49–61, IEEE, 2013.

延伸閱讀