越來越多軟體開發者和資訊技術人員擁抱容器技術,容器技術將需要的資源和設定封裝成映像檔,經由映像檔產生的容器可避免許多兼容性的問題。容器的叢集化工具配合自動化開發和複雜的系統的需求也是勢在必行。 在這篇論文中,我們提出了透過checkpoint和restore技術加強Docker Swarm叢集。主要有兩點:高可用性系統和容器遷移。我們使用多版本checkpoint可以針對特定的容器定期設定checkpoint儲存至雲端儲存空間,若叢集中的節點遇到不正常的離線時,可以及時回復最近的容器狀態到健康的節點上。另外,透過Docker Swarm的分配器,我們能更簡單的將容器在多個節點中相互搬移。根據實驗結果,在使用pre-dump和track-memory時可省下10%~20%的時間和節省200%以上的儲存空間。
More and more software developers and information-technology professionals embrace the container technology, because it packs the required libraries and settings of software into a single image so that the deployment can be easily done anywhere without the issues of compatibility. On top of single containers, orchestration tools are essential to automate the deployment and operation of complex systems that involve multiple containers on cluster of machines. In this thesis, we present the idea of using checkpoint-and-restore technique to enhance the functionality of Docker Swarm, a state-of-art orchestration tool for Docker containers. Two major functions are focused: high availability and migration. We used multi-version checkpoints to enhance the availability of containers and investigated the optimal storage and performance for checkpoint-and-restore. For migration, we leverage the shared storage and Docker Swarm's scheduler to make migration easier. We also studied the possibility of live migration for our implementation. Experiments show that pre-dump and track-memory will save about 10% ~ 20% container checkpoint frozen time and at least 200% storage space.