透過您的圖書館登入
IP:18.119.123.32
  • 學位論文

以低複雜度延伸有效的指令窗以容忍資料讀取失誤延遲

Tolerating Load Miss Latency by Extending Effective Instruction Window with Low Complexity

指導教授 : 鍾崇斌

摘要


增加指令窗的尺寸和維持高頻率的時脈相衝突。而run-ahead execution可以發掘更高的記憶體平行度。然而,因為在大指令窗維持指令相依關係的困難度,在run-ahead狀態之下所產生的執行結果被浪費了。我們提出一個大的指令窗設計,相較於傳統亂序執行處理器,只使用簡單的結構和管理以維持高時脈。主要的概念為,當遇到一個長快取失誤指令,指令串以及部分的執行結果被暫時地移到一個大容量且快速的保留指令佇列。因此,指令窗可以在發生長快取失誤的同時繼續發掘未來的指令平行度。 實驗結果指出,在一個有一千個欄位的保留指令佇列的四路處理器下,相對於原本的run-ahead設計,對於SPEC INT2000以及SPEC FP2000分別有5%與10%的加速效果。

關鍵字

指令窗 延遲容忍 複雜度

並列摘要


The confliction between increasing the instruction window size and keeping the clock cycle time small is getting worse. The run-ahead execution eases this problem by exploring higher memory level parallelism (MLP). However, the execution results produced in the run-ahead state are wasted due to the difficulty to maintain dependency across large instruction window. We propose a large instruction window design with the cycle time of simple structures and easy management. The main idea is, while a long latency load miss happens, the instructions with execution results are sequentially driven into a large and fast preserving buffer. With a 1K-entry preserving buffer, the experimental results show that a 4-way processor with our design can achieve speedups of 5% and 10% over the original run-ahead execution design for SPEC INT2000 and SPEC FP2000.

參考文獻


[11] O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt, “Runahead execution: An alternative to very large instruction windows for out-of-order processors,” in HPCA ’03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture. Washington, DC, USA: IEEE Computer Society, 2003, p. 129.
[10] S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton, “Continual flow pipelines,” in ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems. New York, NY, USA: ACM Press, 2004, pp. 107–119.
[12] H. Akkary, R. Rajwar, and S. T. Srinivasan, “Checkpoint processing and recovery: Toward scalable large instruction window processors” in MICRO ’03: Proceedings of the 36th International Symposium on Microarchitecture. Washington, DC, USA: IEEE Computer Society, 2003, p. 423.
[2] E. Riseman and C. Foster, “The inhibition of potential parallelism by conditional jumps,” Transactions on Computers, vol. C-21, no. 12, pp. 1405–1411, Dec. 1972.
[3] P.Michaud, A. Seznec, and S. Jourdan, “An exploration of instruction fetch requirement in out-of-order superscalar processors,” Int. J. Parallel Program., vol. 29, no. 1, pp. 35–58, 2001.

延伸閱讀