透過您的圖書館登入
IP:3.129.22.135
  • 學位論文

群組虛擬機容錯系統實作與優化

Implementation and Optimization of Group Virtual Machine Fault Tolerance

指導教授 : 徐慰中

摘要


隨著雲端計算系統的興起,將單一服務拆解為多數互相溝通之微服務 以增進程式開發與維護效率成為趨勢。而這些服務多半透過如訊息傳 遞介面等程式軟體庫來進行微服務間的溝通。 現行透過基於快照之容錯系統透過輸出緩衝的方式,實現了無縫 容錯轉移的功能,即使在錯誤發生的情形下,應用程式使用者也不會 察覺到提供服務的伺服器已轉移。然而輸出緩衝的使用會降低網路的 傳輸的效率,使得以網路傳輸為主的應用在應用此容錯系統時,使效 能受到嚴重的影響。 在此論文中,我們提出群組虛擬機器容錯系統的概念,旨在透過 取消內部輸出緩衝的方式,增進分散式服務在容錯系統中的效能,並 佐以此方式對於效能影響的評估數據。原先快照與轉移的相關程序也 必須因應輸出緩衝的取消做出更改,以因應群組中虛擬機記憶體狀態 一致性的要求。此外,本論文中也提出了一種對於基於容錯系統啟動 與再啟動之協定產生的系統下線時間,透過避免群組中部分虛擬機之 記憶體轉移的方式,減少整個群組下線時間的方法。

並列摘要


With the rise of Cloud Computing, it is possible to break up a single service into multiple components that communicate with each other using message passing library such as MPI to achieve better software development and testing. Existing checkpoint-based Fault-Tolerance systems make use of output-bu ering technique to realize seamless service failover, that is, to make sure that ap- plication end-users aren’t aware of service failover when hardware fault occurs. However, applications with large amount of inter process commu- nication experience uneglectable communication overhead due to the use of output-bu ering. In this thesis, we propose the concept of Group Virtual Machine Fault- Tolerance, that enables Fault-Tolerance protection for a distributed service without the need of bu ering intergroup communication. Modi cations to checkpoint and failover procedure had been made in order to maintain the consistency of memory state within group. The evaluation of such approach is given. An optimization regarding system downtime caused by the initial- ization and re-initialization of the Group Fault-Tolerance protocol is also introduced and evaluated in this work.

參考文獻


[3] T. C. Bressoud and F. B. Schneider. 1995. Hypervisor-based Fault Tolerance. In
[9] Yoshi Tamura. 2008. Kemari: Virtual machine synchronization for fault tolerance using DomT. Xen Summit.
[1] 2012. Fault Tolerance & High Availability. https://media.amazonwebservices.com/ architecturecenter/AWS_ac_ra_ftha_04.pdf. (2012). Accessed: 2017-08-02.
[2] Joel Bartlett, Jim Gray, and Bob Horst. 1987. Fault Tolerance in Tandem Com- puter Systems. Springer Vienna, Vienna, 55–76. https://doi.org/10.1007/978-3- 7091-8871-2_3
Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP ’95). ACM, New York, NY, USA, 1–11. https://doi.org/10.1145/224056.224058

延伸閱讀