基於KVM虛擬機器指令層級之狀態同步機制

許多雲端運算平台廣泛使用虛擬機器，但是虛擬機器卻會因各種原因的錯誤而造成服務中斷。雲端服務會因系統各種錯誤中斷服務，進而造成不同程度的損失，因此需要提高可用性，讓服務不會因為系統發生問題而無法使用。容錯系統為高可用性的應用，通常有兩種方法實作，一是透過完整複製Primary VM的狀態到Backup VM來達成同步，稱之為記憶體層級容錯，另一是讓相同初始狀態下Backup VM重現Primary VM所執行的指令來達成同步，稱之為指令層級容錯。然而目前KVM上的開源容錯系統專案均為記憶體層級容錯。本論文因而鎖定在研究指令層級容錯的技術，並在KVM上實作出一個系統雛形。我們透過指令群循序執行機制，讓主虛擬機與備援虛擬機依序執行同樣數目的指令，而主虛擬機在計算過程中產生的Non-deterministic Events也被準確紀錄並在正確的時機於備援虛擬機重現，以達到同步的效果。

關鍵字

KVM ；虛擬機；高可用性；容錯系統

並列摘要

Virtual machines (VMs) have been widely used in many cloud computing platforms, and they may fail for many reasons. Once a VM becomes failed, the cloud services running on it fail consequently and the service providers may suffer deferent levels of property and business losses. To prevent such a failure, one can use the fault tolerance technology to protect a VM. That is, a backup VM is used and its execution state is synchronized with the VM to be protected. When the VM to be protected fails, the backup VM replaces it immediately. There are two approaches to implement a fault tolerance mechanism on VM. The concept of the first one, namely the memory-level fault tolerance, is to synchronize the memory content of the pair of VMs. The concept of the second one, namely the instruction-level synchronization, is to execute the same instructions and events with the same order on the pair of VMs. The first type has been seen in open-source projects, while the second type can only be found in VMWare. In this paper, we aim to develop a prototype of the instruction-level fault tolerance mechanism on KVM. The proposed mechanism creates a backup VM by cloning the state of the VM to be protected in the beginning. Consequently, it records non-deterministic events on the VM to be protected, turns them into deterministic events on the backup VM, and replays them in the right moment. An overhead analysis is provided in the paper, to see how the replay parameters affect the performance of the proposed fault tolerance mechanism.

並列關鍵字

KVM ； virtual machine ； fault tolerance ； high availability

參考文獻

[3] Popek Gerald J., Goldberg Robert P, Formal Requirements for Virtualizable Third Generation Architecures, Volume 17, p412-421, Association for Computing Machinery, Inc, 1974 July

[6] Business Continuity and Disaster Recovery Workbook: How to Assess the Financial Impact of Downtime, Vision Solutions, Inc, 2008

[7] Jim Gray, Daniel P. Siewiorek, “High- Availability Computer Systems”, IEEE, Vol 24, pp. 39-48, September 1991

[8] Maohua Lu, Tzi-cker Chiueh, “Fast Memory State Synchronizatiion for Virtualization-based Fault Tolerance”, IEEE/IFIP, pp.534-543, July 2009

[9] KVM – KERNEL BASED VIRTUAL MACHINE, Red Hat, Inc, 2009

國際替代計量

基於KVM虛擬機器指令層級之狀態同步機制

未授權

主題瀏覽