於圖形處理器上支援雙程序並行之快速搶先機制

異質計算提供了一個有效率的方式來增進系統效能。在這些異質系統中，圖形處理器扮演了一個非常重要的角色來協助中央處理器加速應用程式的運行。除了傳統的圖形處理工作，圖形處理器可以有效率地加速具有資料平行化特性的工作，從資料中心、雲端計算甚至手持裝置都可以看到其應用。當加速應用程式的需求逐漸提高，為了滿足不同應用程式之間的服務品質，在圖形處理器上支援搶先機制也顯得日益重要，特別是在那些有硬體限制的手持系統上。然而，圖形處器上龐大的內文會導致傳統的內文轉換引起巨大的搶先代價。一個高優先權的程式必須等待長時間的延遲來搶先正在執行的程式，同時也造成了系統效能的降低。因此，在真正的異質計算思維下支援一個快速搶先的機制是非常關鍵的議題。近來，透過圖形處理器架構上的修改有許多搶先機制被提出。然而，這些方法沒有考慮到圖形處理器的資源使用率或是許多核心程式資源共享所造成的碎裂問題。因此，一個高優先權程序的執行時間有可能超過其完成的最後期限。為了提供高優先權程序的服務品質，我們運用了雙程序並行的方式來支援快速搶先。這樣的方式除了能夠達到較細部的搶先以外，也簡化了資源碎裂問題。首先，我們提出一個資源分配的政策來避免資源碎裂問題。第二，我們提出了一個選擇犧牲者的方法來最小化造成的搶先代價，並且符合所要求的延遲時間。實驗結果顯示，對於程序執行的期限違反率而言我們的機制可以達到距離完美的搶先機制僅有2%的差距。此外，比起先前別人提出的搶先機制，我們在搶先的過程中平均改善了資源使用率2.93倍。

關鍵字

異質計算；圖形處理器；行動系統；搶先；內文轉換

並列摘要

Heterogeneous computing has been proven as an efficient way to improve system throughput. In these heterogeneous systems, Graphics Processing Units (GPUs) play an important role for applications acceleration to assist Central Processing Units (CPUs). In addition to traditional graphics workloads, GPUs are able to accelerate data parallel workloads effectively, and have been widely used in data center, cloud computing, and even mobile devices. As the demand of application acceleration increases, a preemption mechanism to fully support Quality of Service (QoS) among different applications is needed, especially for those resource-limited mobile systems. However, traditional context switching incurs tremendous preemption cost due to the large context of GPUs. A high-priority task suffers from a long latency to preempt the running task, and system throughput degrades during the switch time. Therefore, supporting fast GPU preemption is a critical enabling technology to the true heterogeneous computing paradigm. Recently, many preemption mechanisms were proposed on GPUs with architectural extension. However, these preemption schemes either do not consider GPU resource utilization or the fragmentation problem caused by fine-grained resource sharing among multiple kernels. Consequently, a high-priority tasks may violate its deadline. To meet the QoS of a high-priority task, we introduce a dual-kernel approach to support fast preemption on GPUs. Our approach achieves fine-grained preemption and can simplify the fragmentation problem. First, we proposed an resource allocation policy to avoid fragmentation problem. Second, a victim selection scheme is proposed to minimize the preemption cost while satisfying a required preemption latency. The experimental results show that our approach can reach very close to the ideal preemption scheme within 2% difference in terms of deadline violations. Furthermore, on average we improves GPU resource utilization by 2.93x over prior technique during preemption.

並列關鍵字

Heterogeneous Computing ； Graphics Processing Unit ； Mobile System ； Preemption ； Context Switch

參考文獻

[1] J. T. Adriaens, K. Compton, N. S. Kim, and M. J. Schulte. The Case for GPGPU Spatial Multitasking. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture, HPCA ’12, pages 1–12, Washington, DC, USA, 2012. IEEE Computer Society.

Google Scholar

[2] A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on, pages 163–174, April 2009.

Google Scholar

[3] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44–54, Oct 2009.

Google Scholar

[4] D. Lustig and M. Martonosi. Reducing GPU offload latency via fine-grained CPUGPU synchronization. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on, pages 354–365, Feb 2013.

Google Scholar

[5] J. Menon, M. De Kruijf, and K. Sankaralingam. iGPU: Exception Support and Speculative Execution on GPUs. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA ’12, pages 72–83, Washington, DC, USA, 2012. IEEE Computer Society.

Google Scholar

國際替代計量

於圖形處理器上支援雙程序並行之快速搶先機制

未授權

主題瀏覽