在本論文中,我們設計了生物資訊工具,GPU-REMuSiC 與 CUDA ClustalW來處理使用圖形處理器(GPU)加速的序列對齊問題。對於生物學的應用而言,序列比對是分析DNA和蛋白質序列的重要策略,多序列比對(MSA)與限制型多序列比對(CMSA)也是研究生物數據的基本方法。GPU-REMuSiC 以及CUDA ClustalW 可以利用GPU的運算能力來提升處理MSA 和 CMSA的效能。然而,傳統的執行環境是生物學家建立和使用這些生物工具的主要門檻。因此我們應該使用虛擬化技術來使我們的工具擁有雲端服務的特性。 虛擬化已是雲端運算中的基本重要技術,其中GPU是需要被虛擬化的硬體之一,因為其廣泛應用於各種高速運算的情境,尤其是在通用的GPU運算(GPGPU)情況下。雖然過去已有許多GPGPU 虛擬化框架的提出,但是他們受到虛擬機和主機之間的數據頻寬交換的限制;即使存在於TCP/IP的通訊最佳化方法來提高原有的頻寬效能,這種最佳化的方法在於效能受限的網路環境下,仍然擁有許多的延遲。因此,在本論文中,我們設計了一個新的虛擬化框架qCUDA,以提高CUDA程式的效能。qCUDA基於提供虛擬化驅動程式和設備模組的virtio框架,用於執行與API遠端處理和記憶體管理的交互作用。此外qCUDA還為多GPU 上的動態負載平衡提供了可配置的策略。在我們的實驗中,我們從未經修改的CUDA SDK中選擇了幾個測試範例,分別為bandwidthTest, MatrixMul, vectorAdd 和simpleStreams,所有的這些測試範例都演示了GPGPU計算的基本步驟;此外,我們還執行了實際的應用範例,GPU-REMuSiC 與 CUDA ClustalW,為生物資訊工具,以證明qCUDA的實用性。在我們的測試環境中,透過與實體機相比,qCUDA實現的大多數結果都在實體機頻寬的95\%以上。此外,與過去其他的研究進行比較,qCUDA具有更多的彈性(flexibility) 和間接性(interposition);CUDA的兼容程式可以分別執行於QEMU-KVM虛擬機管理程序上的Linux和Windows 虛擬機。
In this thesis we designed the biological tools, GPU-REMuSiC and CUDA ClustalW, to deal with the sequence alignment problem using Graphics processing units (GPU) acceleration. For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment (MSA) and constraint sequence alignment (CMSA) are the essential methodologies to study biological data. Use GPU computing power, GPU-REMuSiC and CUDA ClustalW can improve the performance of solving the MSA and CMSA issues. However, the traditional execution environment is a threshold for biologists to set up and use these biological tools. Therefore, we should take advantage of virtualization technology to make our tools with the features of the potential cloud service. Current virtualization has become an important technology in cloud computing. GPU is one of the virtualized hardware since it is widely applied to the high performance computing applications, especially for the computing of general-propose GPU (GPGPU). Although many GPGPU virtualization frameworks have been proposed, the performance of them is limited by the bandwidth of data transactions between the virtual machine (VM) and host; even though there was an optimized method of TCP/IP-based communications to improve the performance via a high speed interconnect network. This optimized method still gave the considerable latency through the powerless network interface. Therefore, in this thesis, we design a new virtualization framework, qCUDA, to improve the performance of compute unified device architecture (CUDA) programs. qCUDA is based on the virtio framework to provide the para-virtualized driver and the device module for performing the interaction with API remoting and memory management. Moreover, qCUDA also provides a configurable policy for dynamic load balancing on multi-GPUs. In our experiment, we choose from unmodified CUDA SDK, which are bandwidthTest, MatrixMul, vectorAdd and simpleStreams, all of these benchmarks show the essential steps of GPGPU computing; furthermore, we also execute the practical biological applications, GPU-REMuSiC and CUDA ClustalW, to proof its practicability. In our test environment, qCUDA can achieve above 95\% of the bandwidth efficiency for most results by comparing with the native. In addition, by comparing with prior work, qCUDA has more flexibility and interposition that it can execute CUDA-compatible programs in the Linux and Windows VMs, respectively, on QEMU-KVM hypervisor for GPGPU virtualization.