HDFS分散式檔案系統容錯管理架構

由於現今網路發展迅速，大量的應用從原本的單機操作轉為透過網路多機操作，也促使了雲端運算技術的發展。例如由Yahoo出資研發的Hadoop、MapReduce，Google開發的GFS、Big table…等。其中Hadoop所使用的Hadoop Distributed File System(HDFS)，主要使用Master/Slave的架構配置，由單一NameNode來管理整個系統，多個DataNode來負責幫助系統儲存資料。在此種配置之下，由單一節點掌握大量重要的Metadata資料，若此節點發生錯誤，造成資料檔案毀損，整個系統會因此而無法正常運作並發生Single Point of Failure(SPOF)的問題，SPOF對於整個系統而言會造成巨大的損失。並且在傳統Master/Slave配置的HDFS當中，所有的要求以及回應都必須要經過Master Node處理，因此造成網路上大量的資料都往NameNode擁入，使得網路速度緩慢，來回資料傳遞耗時。使得整體系統效能不彰。因此，在本研究當中以Job為單位，每個Job動態配置一個Sub_NameNode來負責管理此Job，藉此來舒緩網路壅塞的情形，同時也加快了Master和Slave之間溝通的速度，並且將Metadata分散到不同的節點同時也分散了資料毀損的風險。有效地把會發生Single Point of Failure的點分為兩種，分別為NameNode和Sub_NameNode，並且針對不同種的SPOF節點皆提出有效解決SPOF的方法，降低其所帶來的影響。

關鍵字

HDFS ； Sub_NameNode ； Centroid Point ； Routing hops

並列摘要

Due to the rapid development of modern Internet, the mode of operation of a large number of applications has changed from single-machine to a cluster of machines over the network. This trend also contributed to the development of cloud computing technology, among which Google invented the MapReduce framework, Google File System (GFS), and BigTable, and Yahoo invested the open-source Hadoop project to implement those technologies proposed by Google. The Hadoop Distributed File System (HDFS) is based on the master/slave model to manage the entire file system. Specifically, a single NameNode acting as the master manages a large number of slaves called DataNodes. Since the NameNode is responsible for maintaining a lot of important metadata information, a NameNode crash can render the entire file system unusable. That is, the NameNode forms a Single Point of Failure (SPOF). In addition, in the master/slave model, all the requests and responses have to go through the master. It is obvious that without load sharing, the NameNode forms a performance bottleneck. Therefore, in this research we propose to allocate Sub_NameNodes dynamically for each MapReduce job, in order to relieve the network congestion, and accelerate the speed of communication between the master and the slaves. Our approach also reduces the risk of data loss by replicating the metadata to the Sub_NameNodes. Once the NameNode fails, its state can be reconstructed from the Sub_NameNodes. The simulation results show significant reduction on both the number of communication hops and the communication time.

並列關鍵字

HDFS ； Sub_NameNode ； Centroid Point ； Routing hops

參考文獻

[5] Craig Zilles and Gurindar Sohi, “Master/Slave Speculative Parallelization,” Microarchitecture, 2002. (MICRO-35). Proceedings. 35th Annual IEEE/ACM International Symposium on, 2002.

[10] Christer A. Hansen, "Optimizing Hadoop for the cluster", Institue for Computer Science, University of Troms0, Norway, http://oss.csie.fju.edu.tw/~tzu98/Optimizing%20Hadoop%20for%20the%20cluster. pdf, Retrieved online Oct. 2012.

[14] S.K.S. Gupta and P.K.Srimani, “Adaptive Core Selection and Migration Method for Multicast Routing in Mobile Ad Hoc Networks,” Parallel and Distributed Systems, IEEE Transactions on, pp. 27-38, Jan. 2003.

[15] Kariv and S.L. Hakimi, “An Algorithmic Approach to Network Location Problems. ii: The P-Medians,” Proc. SIAM J. Applied Mathematics, vol. 37, no. 3, pp. 539-560, Dec. 1979.

[16] S.K.S. Gupta and P.K. Srimani, “An Adaptive Protocol for Reliable Multicast in Multihop Radio Networks,” Proc. 2nd IEEE Workshop Mobile Computing Systems and Applications (WMCSA ’99), pp. 111-122, Feb. 1999

國際替代計量

HDFS分散式檔案系統容錯管理架構

全文下載

主題瀏覽