Efficient and Adaptive Stateful Replication in High Availability Clusters

在高可用度叢集之中，各種高階網路設備追蹤大量的網路連線以達到其功能設計，並將各個連線的狀態複製到備用設備以提供穩定可靠網路服務。現有的機制是精準地複製每一條連線的狀態變化，但是在高流量負荷下，現有的方案效率卻相當耗費頻寬資源。本論文採用兩種不同方向來解決此問題：利用雜湊函式所建構的資料結構來做狀態複製，以及動態偵測系統CPU的負載來決定是否進行狀態複製。首先，本論文提出兩個新的資料結構，稱為Flow Digest (FD) 和Multi-level Counting Bloom Filter (MLCBF)，作為低資源消耗的狀態複製解決方案。根據之前的文獻探討與分析，這是第一次有研究將雜湊函式引入了高可用度叢集的狀態複製機制之中。本論文利用數學分析和大量測試 (包括利用實際網路流量測試來做模擬，以及測試平台上的測試)，來評估所提出方法的效能和各種得失。此外，本論文提出一動態機制，稱為Dynamic Lazy Insertion (DLI)，用來防止網路設備的複製機制持續增加一超載系統的負載。在實際測試平台上的測試，證明了其可行性和效能。

關鍵字

高可用度叢集；狀態複製；階層式雜湊架構

並列摘要

Kinds of stateful stream process engines (SPEs) track a large number of concurrent flow states and replicate them to backups to provide reliable functionality in high availability clusters (HACs). Under high traffic loads, existing solutions in such HACs are expensive because of precise stateful replication. In this dissertation, I study a suite of two methods to address this issue: randomization on replication messages and a replication scheme designed for when system is going to be overloaded. Two new hierarchical structures called Flow Digest (FD) and Multi-Level Counting Bloom Filter (MLCBF) are proposed as low resource-consuming solutions of stateful replication. To the best of my knowledge, it is the first time that randomization has been introduced for stateful replication of HAC in the literature. Analysis and extensive tests are employed to evaluate performance and tradeoffs of the proposed techniques. Most importantly, MLCBF is quite as simple and practical to implement and maintain. Furthermore, an adaptive scheme, called as dynamic lazy insertion, is designed to prevent replication from overloading system and optimize pass-through performance of HAC dynamically. Testbed evaluation demonstrates its feasibility and effectiveness in real situation.

並列關鍵字

multiple hashing ； bloom filter ； replication ； high-availability clusters

參考文獻

[1] M. Stonebraker, U. Cetintemel, and S. Zdonik, “The 8 Requirements of Real-time Stream Processing,” ACM SIGMOD Record, 2005.

[2] M. Balazinska, H. Balakrishnan, S. R. Madden, and M. Stonebraker, “Fault-tolerance in the Borealis Distributed Stream Processing System,” ACM Transactions on Database Systems, 2008.

[4] F. Schneider, “Implementing Fault-Tolerant Services Using The State Machine Approach: A Tutorial,” ACM Computing Surveys, 22(4), 1990.

[5] P. Felber and P. Narasimhan, “Experiences, strategies, and challenges in building fault-tolerant CORBA systems,” IEEE Trans. Comput., 53(5):497–511, 2004.

[7] L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary Cache: A Scalable Wide-area Web Cache Sharing Protocol,” IEEE/ACM Transactions on Networking, 2000.

國際替代計量

Efficient and Adaptive Stateful Replication in High Availability Clusters

主題瀏覽