行動嵌入式多核心系統上的OpenVX框架排程最佳化

近年來，新型的行動嵌入式裝置使用異質多核心架構，在有限的能源下來達到效能的提升，在這樣的系統撰寫程式，OpenVX提供了一套用於電腦視覺處理的標準框架，這個標準框架使用以圖為基礎的執行模型來描述運算行為與資料流的關係，每一個圖中的運算節點可以被分派到不同的運算裝置上完成運算，例如: 循序運行的C或並行處理的OpenMP執行時期在多核心中央處理器(CPU)上運算、平行語言OpenCL計算於圖形處理器(GPU)、遠端程序呼叫至數位信號處理器(DSP)，抑或是以專用硬體完成運算。因此，如何有效率地將所有運算節點安排到這些不同的運算裝置上帶來了最佳化的研究議題。在本論文裡，我們提出了一個考慮記憶體區域性與系統處理能力的OpenVX圖排程方法，這是一個兩階段的排程方法將運算節點分派到不同的運算裝置上，在第一階段為節點粗化操作，將符合條件的節點圈選為群組，接著於第二階段進行排程，考量運算節點特性將其分派到適宜的運算裝置，我們在高通的Dragon Board 810開發板上進行實驗，結果顯示我們提出的兩階段排程方法，可以有效的在異質多核心環境下完成OpenVX程式的排程。

關鍵字

OpenVX ； scheduling ； coarsen ； mobile embedded systems

並列摘要

Modern mobile embedded systems use heterogeneous multi-core architectures to achieve performance improvement under an energy constraint. To program such systems, OpenVX promises to provide a standard programming framework for computer vision processing. OpenVX is with a graph-based execution model to describe the computation behavior and data flow relationship. Each computation node in the graph can be dispatched to a different target, such as multicore CPUs with C and OpenMP runtime, OpenCL on GPUs, remote procedure call to DSP, or even a dedicated hardware. Therefore, how to efficiently schedule all the computation nodes to those different targets opens up the optimization opportunities. In this thesis, we propose a method to schedule OpenVX task graph by considering both memory locality and system throughput. The proposed two phase scheduling method first performs coarsen schemes to cluster nodes together, and then in the second phase a scheduling method is employed to schedule nodes into different targets. The experimental result of our experiments on Qualcomm DragonBoard 810 development board shows that our scheme works well in scheduling OpenVX programs on heterogeneous multi-core environments.

並列關鍵字

OpenVX ；排程；節點粗化；行動嵌入式系統

參考文獻

[5] H. Topcuoglu, S. Hariri, and M.-Y. Wu, “Performance-eﬀective and low complexity task scheduling for heterogeneous computing,” Parallel and Distributed Systems, IEEE Transactions on, vol. 13, no. 3, pp. 260–274,

[6] J. chiou Liou and M. A. Palis, “An eﬃcient task clustering heuristic for scheduling dags on multiprocessors,” in Multiprocessors Workshop on

[7] C. Chen, Y. Chang, Y. Chen, C. Yang, and J. K. Lee, “Switching supports for stateful object remoting on network processors,” The Journal of Supercomputing, vol. 40, no. 3, pp. 281–298, 2007. [Online].

Available: http://dx.doi.org/10.1007/s11227-006-0023-2

[8] Y. Wen, Z. Wang, and M. F. P. O. Boyle, “Smart multi-task scheduling for opencl programs on cpu/gpu heterogeneous platforms,” in 2014 21st International Conference on High Performance Computing (HiPC), Dec 2014, pp. 1–10.

國際替代計量

行動嵌入式多核心系統上的OpenVX框架排程最佳化

全文下載

主題瀏覽