透過您的圖書館登入
IP:18.232.88.17
  • 學位論文

高可用度路由器設計與實作

Design and Implementation of High Availability Routers

指導教授 : 簡榮宏

摘要


隨著網路技術的進步,人們對於網路的依賴程度也與日俱增,對網路服務提供業者來說,如何提供一個具有高可用度的網路環境,讓使用者在進行網路存取時不會感覺有網路中斷的情形發生,是一個很重要並且亟待解決的問題。在本論文中,我們利用連續時間馬可夫鏈推導得到一個可用度方程式,根據此方程式,當路由器要達到電信服務等級時,網路服務提供業者只需要提供主要路由器個數(M)、路由器錯誤率(λ)、路由器修復率(μ)以及路由器錯誤偵測與回復率(δ)這四個參數,本方程式就可以計算並且告知需要配置的備用路由器數量(N)。根據數值分析的結果,我們發現錯誤偵測與回復率是用來減少建置備用路由器數量最主要的參數,當錯誤偵測與回復率愈大,備用路由器的需求數量將會減少。 當備用路由器接手封包轉送的工作時,備用路由器會重新與鄰居路由器進行網路連結資訊交換,用以重新建立網路拓樸表,此一動作將會造成封包轉送服務中斷。為了能夠減少網路服務中斷的時間,使得錯誤偵測與回復率能夠增加,我們利用了完整狀態回復(Stateful backup)技術,主要的技術為,當主要路由器在運作時,就會將其網路連結狀態資料庫同步至備用路由器,如此,當備用路由器進行接手封包轉送工作時,備用路由器就可以根據先前收到的網路連結資料庫立刻建立網路拓樸,並且得到路由路徑表,此時,備用路由器便可以立刻上線運作,而不需要再向其他鄰居路由器索取網路連結資料,如此將可以有效地減少備用路由器接手封包轉送的中斷時間。 為了能夠讓主要路由器同步網路連結資料庫至備用路由器,我們參考並修改OpenAIS系統,提出了一套高可用度管理中介軟體(HAM middleware),此中介軟體可以有效地減少備用路由器接手封包轉送時的網路中斷時間,以達到增加錯誤偵測與回復率之目的。 我們將此高可靠度管理中介軟體安裝於個人電腦(PC)的機器上,並實際進行數值量測,以OSPF為例,根據實驗結果得知,當備用路由器進行換手時,其網路中斷時間將可以比Cisco-ASR 1000、Juniper MX系列路由器與VRRP路由器減少約6%、37.3%與98.6%。 此外,我們也將此高可用度管理中介軟體安裝於ATCA的機器上,ATCA是一個可以提供工業標準模組化架構的平台,可以提供我們一個高效能、靈活調整與可靠的路由器設計。假設路由器的錯誤率與修復率分別為7年與4小時,當發生軟體類型的錯誤時,其備用路由器接手封包轉送工作的網路中斷時間為217 ms ,而當發生硬體類型的錯誤時,其中斷時間為1066 ms。也就是說,架設於ATCA的高可用度路由器的可用度為99.99999905%與99.99999867%,皆能夠達到電信等級可用度的標準。 根據以上我們可以得知,我們所提的高可用度路由器相較於商用的路由器而言,因為我們所提的路由器是架構於一個開放式標準的規格,所以花費會較少將更具有成本效益,且其備援方式可以根據網路架設與使用狀況更靈活地與有效地進行調整。

並列摘要


How to optimally allocate redundant routers for high availability (HA) networks is a crucial task. In this dissertation, a 5-tuple availability function, A(M, N, λ, μ, δ), is proposed to determine the minimum required number of standby routers to meet the desired availability (ρ) of an HA router, where M and N are the numbers of active routers and standby routers, respectively, and λ, μ, and δ are a single router’s failure rate, repair rate, and failure detection and recovery rate, respectively. We have derived the availability function, and analytical results show that the failure detection and recovery rate (δ) is a key parameter for reducing the minimum required number of standby routers of an HA router. Thus, we also propose a High Availability Management (HAM) middleware, which was designed based on an open architecture specification, called OpenAIS, to achieve the goal of reducing takeover delay (1/δ) by stateful backup. We have implemented an HA Open Shortest Path First (HA-OSPF) router, which consists of two active routers and one standby router, to illustrate the proposed HA router. Experimental results show that the takeover delays of the proposed HA-OSPF router were reduced by 6%, 37.3%, and 98.6% compared to those of the industry standard approaches, the Cisco-ASR 1000 series router, the Juniper MX series router, and the VRRP (Virtual Router Redundancy Protocol) router, respectively. In addition, we have also implemented the HA-OSPF router on an ATCA (Advanced Telecom Computing Architecture) platform, which can provide an industrial standardized modular architecture for an efficient, flexible, and reliable router design. Based on our ATCA-based platform with 1/δ = 217 ms for a software failure and 1/δ = 1066 ms for a hardware failure, along with the router module data, 1/λ = 7 years and 1/μ = 4 hours, obtained from Cisco, the availabilities of the proposed ATCA-based HA-OSPF router are 99.99999905% for a software failure and 99.99999867% for a hardware failure. Therefore, the experimental results have shown that both our proposed ATCA-based and PC-based HA-OSPF routers can easily meet the requirement of carrier-grade availabilities with five-nine. In addition, in contract to the industry routers, the proposed HA router, which was designed based on an open architecture specification, is more cost-effective, and its redundancy model can be more flexibly adjusted.

參考文獻


[2] W. Kuo and R. Wan, “Recent Advances in Optimal Reliability Allocation,” Studies in Computational Intelligence, Vol. 39, 2007, pp. 1-36.
[3] S. Srivastava, “Redundancy Management for Network Devices,” The 9th Asia-Pacific Conference on Communications, Vol. 3, Sept. 2003, pp. 1157-1162.
[4] A. Mettas “Reliability Allocation and Optimization for Complex Systems,” Proceedings of the Annual Reliability and Maintainability Symposium, Jan. 2000, pp. 216-221.
[11] C.T. Tsai, R.H. Jan, C. Chen, and C.Y. Huang, “Implementation of Highly Available OSPF router on ATCA,” The 13th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'07), Dec. 2007
[13] T. Bourke, Server Load Balancing, 1st Edition, O'Reilly Media, Aug. 2001.

延伸閱讀