近年來,虛擬機器在雲端伺服器中被廣泛使用。一台實體機器上可能運行著數個虛擬機器,並提供數個服務。此時在這台實體機器上發生的硬體錯誤和電源錯誤將會同時影響無數運行在這台機器上的服務。因此,服務的容錯對這些伺服器來說更加重要。然而,一些商業軟體像是 VMware 所提供的容錯系統價格過於昂貴,並非所有用戶都能負擔。在這篇論文中,我們實作了 CUJU,一個開源、基於虛擬機器的容錯系統在 QEMU 2.8/Linux kernel 4.4.0 平台上,並描述此實作的設計以及運作方法。在論文的最後,我們會基於CUJU的初始版本來量測輸出延遲以及吞吐量之額外負擔,並和 VMware 的容錯系統比較。
In recent years, virtual machines have been widely deployed in the Cloud. A physical machine may run multiple VMs, with multiple services. Failures of a physical machine, including hardware malfunctions and power losses could affect many services running on the machine at the same time. Thus, Fault tolerance for services in the Cloud is of paramount importance for the cloud computing. However, fault tolerance support in commercial products like VMware is expensive, not affordable by many users. In this thesis, we implement CUJU, an open source virtual machine based fault tolerance system using QEMU 2.8/Linux kernel 4.4.0 and describe how it works. We also evaluate the output latency and throughput overhead of this preliminary version of CUJU and compared its performance as well as functionality with the VMware offered Fault Tolerance system on the market.