異質多核心全系統模擬與基於協作緒陣列的資料預取機制

隨著架構設計的發展，多核心處理器的設計已經從傳統的「同質多核心處理器(Homogeneous multi-core processor)」進入「異質多核心處理器(Heterogeneous multi-core)」的時期，AMD提出的HSA(Heterogeneous systems architecture)整合了CPU與GPU整合在同一個晶片裡，而且他們享有共同的定址空間hUMA（heterogeneous Uniform Memory Access），透過這種共享的定址空間CPU可以直接存取GPU上的資料，GPU在運算時也不用事先複製一份CPU空間裡原有的資料。然而，在CPU與GPU共享記憶體的架構下，將會有許多資源共享，包含快取記憶體、匯流排..等。而且還有會Cache Coherence的問題。但是由於HAS的架構才剛提出不久，所以在模擬平台上仍沒有一個完整的模擬環境可以去探討上述架構的問題，因此本論文提出了一個完整的模擬架構，結合了一個CPU的模擬器以及GPU的模擬器，並實現定址空間共享。最後我們在GPU端實現一個基於協作緒陣列的資料預取機制，並比較傳統CPU資料預取及GPU資料預取的差異，以提升GPU端的效能。

關鍵字

異質多核心；模擬；協作緒陣列；資料預取

並列摘要

Computer architecture is transitioning from the homogeneous multicore era into the heterogeneous multicore era. AMD proposes Heterogeneous systems architecture (HSA) which integrates CPUs and GPUs physically on a chip and provides shared virtual address spaces between them. With shared virtual memory, the time of moving data between devices' disjoint memories can be saved. Therefore, there are new resource management issues, such as shared last-level cache managements, MMU for CPU and GPU, main memory management, etc. In addition, the coherence problem between CPU and GPU will be a new issue as well. However, there is no such a complete simulator to provide a platform for us to develop the issue mentioned above. In this thesis, we propose a full system simulation framework for HSA which combines CPU model, QEMU, and GPU model, GPGPU-Sim. For HSA, we support parts of OpenCL 2.0 runtime and global memory segments with shared address space between CPU and GPU. And, we compared the traditional CPU prefetching mechanism with GPU prefetching mechanism and implement a CTA-based prefetching mechanism to improve GPU’s performance.

並列關鍵字

Simulation ； hetergeneous ； HSA ； prefetch

參考文獻

[2]. J. Power, J. Hestness and M. S. Orr, “gem5-gpu: A Heterogeneous CPUGPU Simulator”, in IEEE Computer Architecture Letters, DOI 10.1109/LCA.2014.2299539

[13]. OpenCL 2.0 Reference Pages - https://www.khronos.org/registry/cl/sdk/2.0/docs/man/xhtml/

[15]. J.-L. Baer and T.-F. Chen, “An effective on-chip preloading scheme to reduce data access penalty,” in Proceedings of ACM/IEEE Conference on Supercomputing, 1991, pp. 176–186.

[16]. J.-L. Baer and T.-F. Chen, “Effective hardware-based data prefetching for high-performance processors,” vol. 44, no. 5, May 1995, pp. 609–623.

[17]. J. W. C. Fu, J. H. Patel, and B. L. Janssens, “Stride directed prefetching in scalar processors,” in Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992, pp. 102–110.

被引用紀錄

許嘉利（2011）。傳統民間節日飲食文化：祖母的煮食〔碩士論文，國立臺北藝術大學〕。華藝線上圖書館。https://doi.org/10.6835/TNUA.2011.00058

國際替代計量

異質多核心全系統模擬與基於協作緒陣列的資料預取機制

全文下載

主題瀏覽