透過您的圖書館登入
IP:54.211.203.45
  • 學位論文

利用具因果關係之執行軌跡進行一般用途圖形處理器之晶片網路模擬的有效性

On the Effectiveness of Causality-aware Trace-driven NoC Simulation for GPGPUs

指導教授 : 金仲達

摘要


異質多核心電腦架構漸漸地成為電腦系統的主流。在異質多核心電腦架構的領域裡,為了要達到高密度運算量,通用性運算圖形處理單元(GPGPUs)是不可或缺的一部分。隨著通用性運算圖形處理單元的複雜度上升,設計一個有效率的通用性運算圖形處理單元需要有好的工具來完成。在這之中,執行驅動(execution-driven)模擬器常被用來衡量一個通用性運算圖形處理單元架構設計的好壞。而執行驅動模擬器可以提供高精準度與全面性的效能資料,不過這種模擬器時常需要大量的模擬時間來完成。而這個缺點導致設計開發的速度變慢。另一方面,軌跡驅動(trace-driven)模擬器只需要模擬特定的目標系統元件, 例如晶片系統網路(NoC)或是快取架構,然後依靠執行軌跡(trace)來模仿其他像是運算核心之類的系統元件來進行模擬。像這樣的模擬方式讓軌跡驅動模擬器變得比較快也比較適合拿來做設計開發的工具。但是,執行軌跡是產生執行軌跡的那台機器的運算結果而不是模擬目標的機器。因此,軌跡驅動模擬器常會產生效能資料上大幅度的錯誤。不過最近的趨勢是利用軌跡事件之間的因果關係(causality)來調整每一個事件發生的時間,而不是使用產生執行軌跡的那台機器的絕對時間。在這篇論文裡面,我們應用因果關係感知(causality-aware)之軌跡驅動模擬的概念來衡量一個通用性運算圖形處理單元的晶片系 統效率。我們使用一個廣泛被應用的通用性運算圖形處理單元模擬器Multi2sim來研究如何從其執行軌跡中取得因果關係的資訊。其中一個決定晶片系統事件之間的因果關係困難點是記憶體延遲隱藏的機制,這個機制會讓多個記憶體存取同時被處理。我們會討論如何利用通用性運算圖形處理單元的記憶體屏障指令來確認出因果關係。之後我們會將含有因果關係的執行軌跡拿給一個學術界知名的、經過修改的軌跡驅動晶片系統模擬器Garnet當參數模擬。最後我們的實驗結果顯示因果關係感知的Garnet可以獲得跟執行驅動的模擬器Multi2sim一樣的執行效能趨勢,而原本的Garnet是不行的。

並列摘要


Heterogeneous computer architecture is becoming the mainstream of computer systems. In the landscape of heterogeneous computer architecture, General-Purpose Computing on Graphics Processing Units (GPGPUs) is indispensable for supporting ultra high density computing. As the complexity of GPGPUs increases, designing efficient GPGPUs requires good tools to support. Among them, execution-driven simulators are often used to evaluate the architectural designs of GPGPUs. Execution-driven simulators can provide quite accurate and comprehensive performance data, but they often require very long simulation time, which slows down the process of design space exploitation. On the other hand, trace-driven simulators simulate only the specic components that are of interest, e.g. Network-on-Chip (NOC) or cache hierarchy, and rely on execution traces to mimic the operations of other components, e.g. processor cores. As a result, trace-driven simulators are fast and suitable for design space exploitation. However, traces are the execution results of the trace-generating machines, not target machine. Thus, trace-driven simulators often produce performance data that have large error margins. A recent trend in trace-driven simulation is to use the causality relationships among the trace events to adjust the event timing, instead of using the absolute event time from the trace-generating machines. In this thesis, we apply the con- cept of causality-aware trace-driven simulation to the evaluation of the NOC of GPGPUs. We take a widely used execution-driven GPGPU simulator, Multi2Sim, and study how to extract causality information from its execution trace. One difficulty in determining the causality relationships of NOC events for GPGPUs is the latency hiding mechanism, which allows multiple memory access requests outstanding at the same time. We discuss how to leverage the memory fence instructions of GPGPUs to identify the causality relations. The extracted causality traces are then fed into a well-known trace-driven NOC simulator, Garnet, which is modied to be causality-aware. Our evaluation results show that the causality-aware Garnet can match the performance trend obtained from the execution-driven simulator Multi2Sim, while the original Garnet cannot.

並列關鍵字

NOC Trace-driven HSA GPGPU

參考文獻


[2] Niket Agarwal, Tushar Krishna, Li-Shiuan Peh, and Niraj K Jha, Garnet: A detailed on-
[7] Karen MacDonald, Christopher Nitta, Matthew Farrens, and Venkatesh Akella,
[3] Joel Hestness, Boris Grot, and Stephen W Keckler, Netrace: dependency-driven trace-
[1] Rafael Ubal, Julio Sahuquillo, Salvador Petit, and Pedro Lopez, Multi2sim: A simu-
lation framework to evaluate multicore-multithreaded processors", in Computer Archi-

延伸閱讀