OpenCL之動態記憶體優化及平行性管理

現今，多處理器平台已經成為提升效能的趨勢。多處理器平台可以分為同質多處理器平台(homogeneous multiprocessor platforms)和異質多處理器平台(heterogeneous multiprocessor platforms)。對於某些應用程式而言，例如：科學運算(尤其是線性代數運算)﹑數位訊號處理，在異質多處理器上的執行效能通常會優於在同質多處理器上。然而，編程異質多核心程式是一件困難且煩瑣的事。由Khronos group所發表的開放計算語言(Open Computing Language, OpenCL)是目前常用的一種異質多處理器標準編程規範。它可以支援在不同的異質多處理器上執行，包括：中央處理器﹑圖形處理器﹑加速器等。因為圖形處理器現今已被廣泛地使用，我們的研究著重在以中央處理器與圖形處理器所組成的異質多處理器平台的應用。在圖形處理器的計算中，有兩個影響效能的主要的因素。其一為工作分配，包含如何將應用程式分配至工作項目以及如何將工作項目組織成一個工作群組；另一則是如何充分發揮圖形處理器的記憶體架構特色。如果僅依賴編寫程式者來發揮圖形處理器的優勢，則編寫程式者不只要很熟練於平行程式的編寫，且要十分熟悉硬體的規格。因此，我們提出一個對於開放計算語言之核心程式(kernel)的自動優化編譯程序(compilation pass)。其輸入是演算法正確但不考慮效能的簡易OpenCL核心。我們的編譯程序會對此簡易核心作下列的優化：核心程式分析﹑工作群組重新分配﹑記憶體存取整合以及工作項目合併。除此之外，我們把設計實作在運行期系統(runtime system)上，因此可以針對不同的硬體規格，對於優化參數做動態的調整。雖然在運行期作優化會產生額外的時間耗費，不過程式可以藉由大量計算的核心或是大量的輸入資料來獲得效能提升。實驗結果顯示，我們所選用的程式可以得到平均1.3倍的加速。因此，本論文實現了一個不但可以優化開放計算語言之核心程式而且又可將目標平台(target-platform)之硬體規格納入優化考慮參數的運行期編譯器框架。

關鍵字

開放計算語言； LLVM ；優化；核心程式

並列摘要

Recently, multiprocessor platforms have become trends for achieving high performance. Multiprocessor platforms may be categorized into homogeneous multiprocessor platforms and heterogeneous multiprocessor platforms. For some applications with large concurrency, such as digital signal processing, linear algebra matrix operations, and so on, executing on heterogeneous multiprocessors usually achieves higher performance than on homogeneous multiprocessors. However, it is difficult and tedious to program applications for executing on heterogeneous multiprocessors. OpenCL (Open Computing Language), released by Khronos group, is one of the programming standards for heterogeneous multiprocessor, and provides portability for heterogeneous multiprocessor platforms. OpenCL may support three types of device, CPUs (Central Processing Unit), GPUs (Graphic Processing Unit), and accelerators. Our research focuses on platforms with CPUs and GPUs, because GPUs are now widespread in use. On such a platform, two programming issues may affect the performance on GPU computing significantly. One is the work load distribution including parallelizing application into work items and distributing work items into workgroups. The other is the employment of GPU memory hierarchy. To fully utilize the characteristics of GPUs, programmers have to be not only proficient at parallel programming but also familiar with hardware specification. Therefore, in this thesis, we propose a compilation pass to automatically perform optimizations for OpenCL kernels. The input is a naïve kernel which is functionally correct without optimization for performance improvement. Our compilation pass will transform the input kernel function with optimizations, including kernel function analysis, work-group rearrangement, memory coalescing, and work-item merge. In addition, our framework is implemented on a runtime system so that it may dynamically adjust the optimizing parameters according to the hardware specifications. Although the optimizations performed in runtime may incur overheads of execution time, the overheads may be covered by massive kernel computation or input data in most cases. The experiment results of our benchmarks demonstrate that the applications may gain 1.3 times speedup in average. Therefore, we design and implement an optimization pass for OpenCL which may take hardware specification of target platform into account for optimization in a runtime compiler framework based on LLVM.

並列關鍵字

OpenCL ； LLVM ； optimization ； kernel function

參考文獻

[1] Chi-Keung Luk, Sunpyo Sun, and Hyesoon Kim, "Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors," Proceedings of the 2009 ACM/IEEE International Symposium on Microarchitecture (MICRO), pp. 44-55, December 2009.

[4] LLVM Developer Group, "The LLVM Compiler Infrastructure," [Online]. Available: http://llvm.org/.

[10] Yi Yang, Ping Xiang, Jingfei Kong, Mike Mantor, Huiyang Zhou, "A unified optimizing compiler framework for different GPGPU architectures," ACM Transactions on Architecture and Code Optimization (TACO), vol. 9, 2012.

[2] T.-C. Tsai, OMP2OCL Translator: A Translator for Automatic Translation of OpenMP Programs into OpenCL Programs, Master Thesis, National Chiao Tung University, 2010.

Google Scholar

[3] Khronos OpenCL Working Group, "OpenCL 1.1 Reference Pages," [Online]. Available: http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/.

Google Scholar

國際替代計量

OpenCL之動態記憶體優化及平行性管理

全文下載

主題瀏覽