群組虛擬機容錯系統實作與優化

隨著雲端計算系統的興起，將單一服務拆解為多數互相溝通之微服務以增進程式開發與維護效率成為趨勢。而這些服務多半透過如訊息傳遞介面等程式軟體庫來進行微服務間的溝通。現行透過基於快照之容錯系統透過輸出緩衝的方式，實現了無縫容錯轉移的功能，即使在錯誤發生的情形下，應用程式使用者也不會察覺到提供服務的伺服器已轉移。然而輸出緩衝的使用會降低網路的傳輸的效率，使得以網路傳輸為主的應用在應用此容錯系統時，使效能受到嚴重的影響。在此論文中，我們提出群組虛擬機器容錯系統的概念，旨在透過取消內部輸出緩衝的方式，增進分散式服務在容錯系統中的效能，並佐以此方式對於效能影響的評估數據。原先快照與轉移的相關程序也必須因應輸出緩衝的取消做出更改，以因應群組中虛擬機記憶體狀態一致性的要求。此外，本論文中也提出了一種對於基於容錯系統啟動與再啟動之協定產生的系統下線時間，透過避免群組中部分虛擬機之記憶體轉移的方式，減少整個群組下線時間的方法。

關鍵字

虛擬化；容錯系統；分散式系統

並列摘要

With the rise of Cloud Computing, it is possible to break up a single service into multiple components that communicate with each other using message passing library such as MPI to achieve better software development and testing. Existing checkpoint-based Fault-Tolerance systems make use of output-bu ering technique to realize seamless service failover, that is, to make sure that ap- plication end-users aren’t aware of service failover when hardware fault occurs. However, applications with large amount of inter process commu- nication experience uneglectable communication overhead due to the use of output-bu ering. In this thesis, we propose the concept of Group Virtual Machine Fault- Tolerance, that enables Fault-Tolerance protection for a distributed service without the need of bu ering intergroup communication. Modi cations to checkpoint and failover procedure had been made in order to maintain the consistency of memory state within group. The evaluation of such approach is given. An optimization regarding system downtime caused by the initial- ization and re-initialization of the Group Fault-Tolerance protocol is also introduced and evaluated in this work.

並列關鍵字

Virtualization ； Fault-Tolernace ； Distributed System

參考文獻

[3] T. C. Bressoud and F. B. Schneider. 1995. Hypervisor-based Fault Tolerance. In

[9] Yoshi Tamura. 2008. Kemari: Virtual machine synchronization for fault tolerance using DomT. Xen Summit.

[1] 2012. Fault Tolerance & High Availability. https://media.amazonwebservices.com/ architecturecenter/AWS_ac_ra_ftha_04.pdf. (2012). Accessed: 2017-08-02.

Google Scholar

[2] Joel Bartlett, Jim Gray, and Bob Horst. 1987. Fault Tolerance in Tandem Com- puter Systems. Springer Vienna, Vienna, 55–76. https://doi.org/10.1007/978-3- 7091-8871-2_3

Google Scholar

Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP ’95). ACM, New York, NY, USA, 1–11. https://doi.org/10.1145/224056.224058

Google Scholar

國際替代計量

群組虛擬機容錯系統實作與優化

全文下載

主題瀏覽