  • 學位論文

OpenCL Runtime Supports for Multi-core PAC DSPs

支援多核心PAC DSP的OpenCL 執行期函式庫

指導教授 : 李政崑


OpenCL是一個為了整合異質多核心編程的工業標準,在程式執行期間,OpenCL以work-group的形式管理程式的運算,每個work-group則是由work-item所組合而成的,每個work-item可以根據特定的globalID或localID在一個執行元素(processing element)上平行執行,而一個work-group則是在一個運算單元(compute unit)上執行,雖然OpenCL已經成功的被廣泛應用在CPU、GPU和GPGPU這些平台上,但是卻很少被使用在嵌入式多核心訊號處理器平台。這篇論文主要的貢獻是對於一個名叫PACDUO的嵌入式多核心訊號處理器平台,提出了OpenCL的執行期函式庫和一套編譯器的流程。PACDUO是由一顆MPU和兩顆five-way issue的VLIW DSP所組合而成的,每個DSP含有三個集群(cluster),其中一個是控制程式的流程,另外兩個則是專門處理運算,為了減少MPU和DSP之間的溝通成本和充分利用DSP的硬體資源,本論文將核心序列化(kernel serialization)和核心向量化(kernel vectorization)整合進編譯的流程中。在本篇論文的實驗,我們使用一系列的OpenCL程式來測試OpenCL執行期函式庫的可用性以及編譯器的最佳化,根據實驗結果,在兩顆DSP的加速下,我們可以得到1.99倍的程式效能加速。


OpenCL is an industry open standard and an attempt to integrate heterogeneous multi-core programming. In order to unify parallel computing, OpenCL organizes computations into work-groups, and each group consists of work-items. A work-item executes independently on a processing element by its globalID and localID and a work-group executes on a compute unit. Although OpenCL framework implementations have been with early success on CPU, GPU and GPGPU, they are rarely implemented on embedded multi-core DSP systems.This paper presents an OpenCL runtime library and a compiler flow support for embedded multi-core DSP system. The target platform in this paper is a heterogeneous multi-core embedded system called PACDUO. The system consists of one MPU and 2 five-way issue VLIW DSPs. Each DSP includes three cluster where one for control flow and the other two for computation. In order to reduce context switch overhead and utilize the benefit of clusters, kernel serialization and kernel vectorization and integrated into the complier flow. In the experiment, this paper apply a set of OpenCL benchmark programs to evaluate the runtime library availability and compiler optimizations. Through the experimental result, this work reports near 2-fold multi-core performance speedup with two DSPs.


OpenCL Multi-core PAC DSP Runtime


[5] J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo,
work for heterogeneous multicores with local memory," in Proceedings of the 19th
international conference on Parallel architectures and compilation techniques, ser.
[7] C. Kuan and J. Lee, Compiler supports for VLIW DSP processors with SIMD intrinsics," Concurrency and Computation: Practice and Experience, 2011.
[1] Khronos OpenCL Working Group, The OpenCL Specivication Version 1.1,Khronos OpenCL Working Group Std., Sep. 2010.
