A KVM-Based GPGPU Virtualization Technique for Windows

Virtualization technology facilitates the sharing of resources by abstracting the underlying hardware and improving utilization across applications. In recent years, General Purpose Graphics Processing Units (GPGPUs) have become critical in high performance computing (HPC). Yet, the lack of open source programming models for GPUs and their architectural design pose significant challenges in using them in virtualized environments. The purpose of this study is to enhance the performance of HPC applications using the Compute Unified Device Architecture (CUDA) executing on the Guest OS. We used Qemu-KVM as the Virtual Machine Monitor (VMM) and Windows 8.1 as the Guest OS. A shared imposter library was created in the guest OS, which intercepts legitimate CUDA function calls and subsequently relays the request to the guest OS virtual device driver. The imposter driver then performs initializations, memory allocation and validation and then sends the packaged request to the virtual device. The virtual device in the VMM dequeues the request and, using the legitimate CUDA driver API, executes the request. When the virtual device completes the request, the results are channeled to the guest OS imposter driver. Ultimately, the results are presented to the user application as if CUDA were installed in the guest VM. Results illustrate near host performance for page locked memory and substantial improvement of pageable memory as a result of reduced TLB translation and reducing translation cost of pages’ virtual address to physical address and vice-versa when using large pages of 2MB on guest OS.

關鍵字

CUDA ； GPGPU ；高性能運算

並列摘要

虛擬化技術藉由抽象化底層硬體以及增進跨應用程式間的硬體使用，加速了資源的共享。近年來，一般用途的圖形處理器 (GPGPU) 在高性能運算領域變成相當關鍵，然而缺乏開源的 GPU 程式模組以及 GPU 本身的架構設計使得在虛擬化的環境下使用一般用途的圖形處理器有相當大的挑戰。這篇研究的目的是增進在虛擬機上使用 CUDA 做高性能運算應用程式的效能，我們使用QEMU-KVM作為我們的虛擬機監測器，Windows 8.1作為我們虛擬機上的作業系統，建置一個共享的中介函式庫在虛擬機上的作業系統裡，負責合法地轉換 CUDA 的函式呼叫，其會將函式呼叫轉至中介的驅動程式，然後中介的驅動程式在初始化、記憶體配置和驗證後會將此需求送給在虛擬機內的虛擬設備，在虛擬機監測器內的虛擬設備會從佇列取出需求，然後使用適當的 CUDA API 來處理相對應的需求。一旦設備完成需求，結果會被送回到虛擬機上的中介驅動程式。最後，假如 CUDA 有被安裝在虛擬機上，結果會成功地送到使用者應用程式上。結果說明當虛擬機上的作業系統使用2MB的大分頁，能減少TLB的轉換和減少分頁的虛擬記憶體與實體記憶體間的轉換消耗，因此有顯著的效能提升，且對於分頁鎖定的記憶體能有接近實體機的表現

並列關鍵字

CUDA ； GPGPU virtualization ； High Performance Computing

參考文獻

[4] M. Oikawa et al., "DS-CUDA: a middleware to use many GPUs in the cloud environment," in SC, 2012.

[6] V. A. Smirnov, E. V. Korolev and O. I. Poddaeva, "Cloud Environments with GPU Virtualization: Problems and Solutions," in International Conference on Data Mining, Electronics and Information Technology, 2015.

[7] L. Shi et al., "vCUDA: GPU accelerated high performance computing in virtual machines," in IPDPS, 2009.

[9] J. Duato, A. J. Pena, F. Silla, R. Mayo and E. S. Quintana-Orti, "rCUDA: Reducing the Number of GPU-Based Accelerators in High Performance Clusters," 2010.

[10] C. Reaño, F. Silla, A. J. Peña, G. Shainer, S. Schultz, A. Castello, E. S. Quintana-Orti and J. Duato, "Boosting the performance of remote GPU virtualization using InfiniBand Connect-IB and PCIe 3.0," in International Conference on Cluster Computing, Madrid, 2014.

國際替代計量

A KVM-Based GPGPU Virtualization Technique for Windows

主題瀏覽