低功耗指令快取記憶體與高成本效益的電子束直寫系統之解碼器

此論文提出了一種節能，低成本的指令快取記憶體查找技術，稱為動態早期標籤查找（DETL）的方法。DETL利用閒置的週期對快取記憶體的索引執行早期的標籤查找。提早獲得配對記憶體的資訊，可以節省用於並行存取其他快取記憶體的動態功耗。我們在RISC-V微結構中的四路集關聯指令快取記憶體上實現 DETL，並使用SPEC CPU2006基準套件測試其性能。我們觀察到動態功耗降低 19.38％，而成本增加不到 0.1％。數據流量是多電子束直寫（MEBDW）系統中的一項關鍵指標，因此需要高效能的數據處理設備。主要的挑戰是如何通過具有成本效益的技術來實現高性能。此論文提出了一種高壓縮率的數據傳輸演算法和高速解壓縮的硬體實現方法，以提高系統的數據流量。硬體解碼器使用管道體系結構，運用長度編碼先進先出（FIFO）暫存陣列和並行調度邏輯來提高數據流量。該解碼器在FPGA上進行評估，並使用此論文提出的壓縮演算法所壓縮的佈局圖像進行模擬。結果顯示，與以前的方法相比，在類似的硬體成本下，壓縮率提高了 18.2％，數據流量提高了 254.8％。由於在設計中不使用靜態隨機存取記憶體（SRAM），因此可以輕鬆擴展系統的通道數，這使下一代MEBDW系統有可能實現更高的每小時晶圓（WPH）製造目標。

關鍵字

動態早期標籤查找；降低動態功耗；節能處理器；關聯指令緩存；多電子束直寫；數據壓縮；硬體解碼器

並列摘要

Power consumption is the most important issue of modern processors. Instruction fetch activity cooperating with instruction cache is an energy hot spot. Dynamic power reduction in instruction cache lookup can contribute to the improvement of energy efficiency in processors. An energy-efficient and low area-overhead instruction cache lookup technique called Dynamic Early Tag Lookup (DETL) is proposed. DETL exploits a fetch bubble cycle to perform an early tag lookup for the index of the matching cache set. Therefore, the dynamic energy for parallel accesses of other cache memory banks may be saved. We implement DETL on a four-way set-associative I-cache in a RISC-V micro-architecture, and test its performance using the SPEC CPU2006 benchmark suite. We observed a 19.38% dynamic power reduction with < 0.1% area overhead. Data throughput is a critical metric in a Multiple Electron-Beam Direct-Write (MEBDW) system so that heavy-duty data processing equipment is required. The main challenge is about how to achieve high performance with cost-effective techniques. In this dissertation, we propose a high compression rate algorithm for efficient data transfer and high-speed decompression hardware to raise data throughput of the system. The hardware decoder uses pipeline architecture, a run-length encoding First-In-First-Out (FIFO) queue, and parallel dispatch logic to increase the throughput. The decoder is evaluated on Field-Programmable Gate Array (FPGA) and simulated with layout images that are compressed using our proposed compression software. The results demonstrate 18.2% better compression rate and 254.8% better throughput than the previous work with similar hardware cost. Because no Static Random-Access Memory (SRAM) is used in the design, the channel number of the system can be easily scaled up, which makes it possible for the next-generation MEBDW system to achieve higher Wafer Per Hour (WPH) targets.

並列關鍵字

dynamic early tag lookup ； energy-efficient ； instruction cache ； multiple electron-beam direct-write ； hardware decoder ； data compression

參考文獻

[1] W. Tang, A. Veidenbaum, A. Nicolau, and R. Gupta, “Integrated i-cache way predictor and branch target buffer to reduce energy consumption,” in International Symposium on High Performance Computing, pp. 120–132, Springer, 2002.

Google Scholar

[2] M. D. Powell, A. Agarwal, T. Vijaykumar, B. Falsafi, and K. Roy, “Reducing set-associative cache energy via way-prediction and selective direct-mapping,” in Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pp. 54–65, IEEE Computer Society, 2001.

Google Scholar

[3] K. Inoue, T. Ishihara, and K. Murakami, “Way-predicting set-associative cache for high performance and low energy consumption,” in Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No. 99TH8477), pp. 273–275, IEEE, 1999.

Google Scholar

[4] C.-L. Yang and C.-H. Lee, “Hotspot cache: joint temporal and spatial locality exploitation for i-cache energy reduction,” in Proceedings of the 2004 international symposium on Low power electronics and design, pp. 114–119, ACM, 2004.

Google Scholar

[5] A. Sembrant, E. Hagersten, and D. Black-Shaffer, “Tlc: A tag-less cache for reducing dynamic first level cache energy,” in 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 49–61, IEEE, 2013.

Google Scholar

國際替代計量

低功耗指令快取記憶體與高成本效益的電子束直寫系統之解碼器

未授權

主題瀏覽