支援圖形處理單元上虛擬記憶體的 記憶體管理單元快取系統和線程塊排程優化

隨著Dark silicon現象在先進製程越趨顯著，晶片效能將受限於功耗上限，客製化的硬體加速器運算單元在未來更顯重要。圖形處理器(GPU)從早期圖形計算的加速器，現今已經走向可程式化的通用運算處理器(GPGPU)，並且即將朝向異質運算架構(HSA)發展，CPU與各類型的客製化加速器(GPU,DSP…etc)將共享同樣的虛擬記憶體空間，每個運算單元都需要記憶體管理單元(MMU)進行虛擬地址轉換，然而GPU同時間大量的記憶體存取將造成MMU的嚴重負擔，虛擬位置轉換佔執行時間的比例將會增加。本研究模擬GPU上的虛擬位址轉換，並且在記憶體管理單元置放L1,L2 TLB，並且分析各個benchmark效能損失的原因。再者我們分析GPU的block ID和存取過的address關係，提出考慮記憶體位置轉換成本的線程塊排程的策略。

關鍵字

圖形處理單元；異質運算架構；記憶體管理單元；虛擬記憶體位置轉換

並列摘要

As “Dark silicon” phenomenon becomes obvious in advanced process, the performance of IC will soon be bounded by the power budget. Investigation on the design of customized hardware accelerator gradually takes the place of mainstream research on CPU. Graphics Processing Unit(GPU) is originally developed for the acceleration of graphic computation. It now is evolving into a programmable, general purpose computing unit(GPGPU). In the future, heterogeneous system architecture(HSA) will merge all computing units(CPU, GPU, DSP…etc) to share same virtual address space to simplify programming and to allow sharing of data between these units. As a result each unit will need to have an MMU to translate its virtual addresses into physical addresses. However, with large amount of memory accesses by these units, system performance may be impacted by the address translation process. This paper evaluates the virtual address translation impact on GPU through software simulation. We propose to place L1 private TLB and L2 shared TLB to reduce the overhead of address translation. We analyze the correlation between block ID and memory address trace. By collecting runtime information, we can select better thread block scheduling strategy to achieve higher performance.

並列關鍵字

GPU ； HSA ； MMU ； Virtual address translation

參考文獻

[16] HSA specification, "http://www.hsafoundation.com/"

[1] Power, Jason, M. Hill, and D. Wood. "Supporting x86-64 Address Translation for 100s of GPU Lanes." HPCA, 2014.

[2] Pichai, Bharath, Lisa Hsu, and Abhishek Bhattacharjee. "Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces." ASPLOS, 2014.

[4] Lee, Janghaeng, Mehrzad Samadi, and Scott Mahlke. "VAST: The Illusion of a Large Memory Space for GPUs." PACT, 2014.

[5] Pham, Binh, et al. "CoLT: coalesced large-reach TLBs." MICRO, 2012.

國際替代計量

支援圖形處理單元上虛擬記憶體的記憶體管理單元快取系統和線程塊排程優化

全文下載

主題瀏覽