精確時間週期與執行驅動之圖形處理器模擬平臺

圖形處理器（GPU）是為了加速圖形運算而被設計出來的，GPU上高度平行化的硬體結構讓它比中央處理器（CPU）能更有效地實作圖形顯示的演算法。由於GPU需要支援愈來愈複雜的功能，並且開發廠商尚未公開其詳細的硬體架構，使得目前GPU的設計變得更加難以評估。為了研究設計GPU，本篇論文提出了一個精確時間週期與執行驅動之圖形處理器模擬平臺，在這個平臺中，為了達到更準確的模擬，GPU模擬器的核心被設計成一個有管線的處理器，同時也有一個精確時間模型的記憶體系統。GPU模擬器核心藉著執行由一連串 OpenGL 函式所轉譯的圖形顯示指令，以精確週期的方式來模擬GPU的行為，而這些 OpenGL 的函式是從真實的 3D 遊戲（例如：Quake 3）裡擷取出來的。為了示範這個平臺的應用，本篇論文也作了一個關於圖形記憶體的研究，藉著套用不同的記憶體存取排程原則來分析對GPU效能造成的影響。而實驗的結果顯示，可適應的排程原則可以得到最佳的效能。

關鍵字

圖形處理器；模擬器

並列摘要

Graphics processing unit (GPU) is designed for accelerating the graphics rendering manipulations. Their highly-parallel structure makes them more effective than CPUs for a range of graphics rendering algorithms. Modern GPUs become increasingly hard to evaluate because it needs to support more complex funcionts and the architecture details are not released by the GPU vendors. To study the GPU design, this thesis proposes a cycle-accurate, execution-driven GPU simulation framework. In this framework, the GPU simulator core is modeled as a pipelined processor and there is also a detailed timing-model of memory system within it for more accurate simulation. The GPU simulator executes rendering commands that are converted from the stream of OpenGL function calls and simulates the behaviours in a cycle-accurate fashion. The OpenGL trace is captured from real 3D games (e.g., Quake 3). To demonstrate the applicability of the framework, this thesis also introduces a study on graphics memory system. I analyze the performance effect by applying different memory access scheduling policies. The experimental results shows that an adaptive policy is the most effective.

並列關鍵字

GPU ； Simulator

參考文獻

[1] Dan Ernst Todd Austin, Eric Larson, “Simplescalar: an infrastructure for computer system modeling,” Computer, vol. 35, pp. 59 – 67, Feb 2002.

[5] Jin-Ho Lee Min-Young Lee Seong-Uk Choi and Myoung-Soon Park, “Reducing cache conflicts in data cache prefetching,” ACM SIGARCH Computer Architecture News, vol. 22, pp. 71 – 77, 1994.

[6] Jean-Loup Baer and Tien-Fu Chen, “Effective hardware-based data prefetching for high-performance processors,” IEEE Transactions on Computers, vol. 44, no. 5, pp. 609–623, May 1995.

[7] Alexander C. Klaiber and Henry M. Levy, “An architecture for software-controlled data prefetching,” The 18th Annual International Symposium on Computer Architecture, pp. 43–53, 1991.

[12] Greg Humphreys and et al., “Chromium: A stream processing framework for interactive rendering on clusters,” SIGGRAPH, 2002.

國際替代計量

精確時間週期與執行驅動之圖形處理器模擬平臺

全文下載

主題瀏覽