雲端資源管控系統分散系統負載之定期備份排程及容錯機制

雲端運算中資源管控系統身負重任，必須在執行任務與維持使用者體驗間取得平衡。本篇為一種改良式定期備份排程及容錯機制，主要處理虛擬化資源管控系統中虛擬機資料備份任務之負載分散、錯誤處理與排程。此系統之分散負載設計共有三大階段：第一階段為透過亂數或特定參數計算，先將任務時間作初步分散避免負載過度集中；第二階段則視系統負載彈性調整處理任務之執行緒數量；第三階段在排程任務時一併考量現有執行緒、待執行任務數及預估操作所需時間等參數，以決定此次排程之單位時間內備份任務是否過重，過重時可直接延後任務排程，甚至更改執行時間點而使任務再次分散。容錯機制於偵測到任務執行失敗或特定任務持續佔用資源將強制中斷任務，且可依使用者需求重新執行若干次或選擇跳過此次任務排程，避免因少數任務執行錯誤或佔用資源而造成整個系統運作受影響。本系統藉由上述之多種設計分散系統負載，並透過容錯機制避免執行錯誤或資源佔用等問題造成備份排程失效，可讓此雲端運算資源管控系統同時兼顧使用者體驗並正確完成平時或突發爆量的備份任務。

關鍵字

雲端運算；虛擬化；定期備份；排程；容錯；負載

並列摘要

The resource management and control system of cloud computing has great responsibility to keep the balance between fulfilling tasks and providing good user experiences. This article is about an advanced backup scheduler and fault tolerant mechanism which mainly deals with the workload distributing, error handling and task scheduling of virtual machine backup tasks in the resource management and control system of cloud computing. The system has three stages of distributing workloads. First, it uses random or some specific variables to make tasks equally distributed. Secondly, the system adjusts thread numbers according to monitored data of system workloads. And finally, the system reviews current thread number, tasks in schedule and expected time to finish each job to see if it can handle tasks in this time period. If it can't, the system will distribute the tasks again. The fault tolerance mechanism interrupts tasks in hanging or failure states. Users can decide that whether interrupted tasks should retry for several times or just skip and wait until next time. This fault tolerance mechanism can guarantee the system to work normally and won't be affected by task failures. This cloud computing management and control system uses the distributing designs and fault tolerance mechanism and can successfully fulfill backup tasks and provide good user experiences at the same time.

並列關鍵字

cloud computing ； virtualization ； backup ； schedule ； fault tolerance ； workload

國際替代計量

雲端資源管控系統分散系統負載之定期備份排程及容錯機制

全文下載

主題瀏覽