透過您的圖書館登入
IP:3.138.122.4
  • 學位論文

以LLVM為基礎轉換OpenACC程式到HSA環境

Translating OpenACC Program for HSA Environment Based on LLVM

指導教授 : 單智君

摘要


近年來,多處理器平台(Multiprocessor platform)已經成為加速程式的一種趨勢。多處理器平台一般可分為同質(homogeneous)與異質(heterogeneous)兩種。對於某些可以平行化的程式而言,異質多處理器平台通常可以得到比同質多處理器平台更高的效能和能源效率。然而,由於資料的管理以及各處理器之間的溝通,撰寫異質多核心程式會比同質多核心程式來得困難。近年來所興起的異質系統架構(Heterogeneous System Architecture),雖然此架構改善了傳統異質多核心架構下CPU與GPU(Graphic Processing Unit)之間大量的資料搬移,讓CPU與GPU能存取共享虛擬記憶體,但為此類架構撰寫異質多核心程式依然是較複雜且容易出錯。因此,我們將以LLVM為基礎設計一個OpenACC編譯器,自動將OpenACC程式轉換成HSA的程式。在本篇論文中,我們設計並且實作LLVM的一個前端編譯器Clang的延伸版本,使其可支援OpenACC程式並將其轉譯成以LLVM Metadata形式保留OpenACC註解相關資訊的LLVM中間碼(以下簡稱PLIR,Parallel Language IR);接著,經由我們所設計的PLIR_annotation Parser,分析PLIR的註解資訊並產生特定的資料結構來儲存與OpenACC註解相關的資訊;最後,再由我們設計的HSA Host/Kernel IR generator(以下簡稱HSA-HKIR generator)來分析PLIR_annotation Parser所產生出的資料結構,並產生相對應的Host LLVM IR及Kernel LLVM IR。之後,Kernel LLVM IR將經過LLVM 後端編譯器中既有的HSAIL Backend與HSAIL Assembler轉譯成HSAIL kernel function;Host LLVM IR則經過LLVM static compiler與System Linker編譯成x86 executable,並且在HSA環境下執行。實驗結果顯示,我們選定的八隻OpenACC程式經由此OpenACC編譯器轉換後的HSA程式,與循序程式比較時,效能加速平均達7.78倍;若與由HSA foundation提出的 OpenMP編譯器所轉換出來的HSA程式比較,效能加速平均達2.91倍。

關鍵字

異質系統架構

並列摘要


Recently, multiprocessor platforms have become trends for accelerating programs. It generally can be divided into two types: homogeneous and heterogeneous. Heterogeneous multiprocessor platforms usually have higher performance and energy efficiency than homogeneous ones on most programs which can be highly parallelized. However, it is more difficult to write programs for heterogeneous multiprocessor platforms than for homogeneous ones due to data management and communication between various processors. Recently, Heterogeneous System Architecture (HSA), kind of newly arising heterogeneous multiprocessor platform is proposed. Although HSA improves the heavy data transmission between CPU and GPU in traditional multiprocessor platform through shared virtual memory, it is still complex and error-prone to write HSA programs for such platforms. Therefore, we intend to design and implement OpenACC compiler based on LLVM for automatically translating OpenACC programs into HSA programs. In this thesis, we design and implement an extension version of Clang, one of the front-ends of LLVM, translate the OpenACC programs into PLIR (Parallel Language IR) with the information of OpenACC directives and clauses as LLVM metadata. Then, we design a PLIR_annotation Parser, which generates specific data structure with the information of OpenACC directives and clauses through analyzing the annotations in the PLIR. After that, we design an HSA Host/Kernel IR Generator (HSA–HKIR generator), which analyzes the specific data structure to generate corresponding the Host LLVM IR and Kernel LLVM IR. Through HSAIL Backend and HSAIL Assembler, the Kernel LLVM IR may be translated to HSA Kernel function. On the other hand, the Host LLVM IR may be compiled to x86 executable through LLVM static compile (LLC) and system linker. Finally, we can execute the x86 executable in an HSA environment. The experiment result shows that the translated HSA programs generated by our OpenACC compiler achieve 7.78x speedups in average than corresponding sequential programs. On the other hand, compared to the OpenMP compiler proposed by HSA foundation, our OpenACC compiler achieves 2.91x speedups in average.

參考文獻


[1] Chi-Keung Luk, Sunpyo Sun, and Hyesoon Kim, “Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors,” Proceedings of the 2009 ACM/IEEE International Symposium on Microarchitecture (MICRO), pp. 45-55, December 2009.
[3] P. Rogers and A. FELLOW, “Heterogeneous system architecture overview,” in Hot Chips, 2013.
[6] Jayshree Ghorpade, Jitendra Parande, Madhura Kulkarni, and Amit Bawaskar, “GPGPU PROCESSING IN CUDA ARCHITECTURE”, Advanced Computing: An International Journal (ACIJ), Vol.3, No.1, January 2012
[10] C.Lattner and V.Adve, “LLVM: A compiler framework for life-long program analysis & transformation,” in Code Generation and Optimization, 2004. CGO 2004. International Symposium on. IEEE, 2004, pp. 75-86.
[12] “CLOC” [online]. Available:

延伸閱讀