透過您的圖書館登入
IP:18.224.63.87
  • 學位論文

支援雲端運算擴展性分散式檔案系統的叢集式FTP伺服器的設計與實作

An FTP Server Cluster Supporting Scalable Distributed File System for Cloud Computing

指導教授 : 姜美玲

摘要


近年來網際網路、網路服務與使用者數量的快速成長,使得許多熱門網站如Google和Yahoo!對於網路頻寬和伺服器負載帶來很大的挑戰,因此架構一個符合經濟效益、高效率、可擴充與易於維護的運算平台之網站顯得相當重要。例如:Yahoo!架設於Hadoop雲端運算之平台,該平台包含平行運算的架構,其檔案系統為Hadoop Distributed File System (HDFS),包含了檔案備份與容錯的機制,可以確保資料的完整性與可靠性,並且可以透過擴充節點數來達到增加儲存容量與運算能力。 FTP (File Transfer Protocol) 是廣泛用於網際網路上分享檔案的標準網路協定,傳統的FTP server架設於單一主機上,當資料量不斷地增加,資料保全與系統穩定性將會是很重要的課題。為了解決單一FTP server的頻寬限制與無法同時處理大量的FTP連線之情形,我們使用LVS-CAD的叢集式系統平台來建構一個FTP server叢集,不同於單一的FTP server,我們的實驗平台一共有多台FTP server形成一個FTP server叢集。而此叢集內每一台FTP server儲存不同的檔案,並且藉由我們實作的FTP Multi-Handoff機制來管理和分配來自使用者的FTP連線,藉由使用者送出的檔案處理封包其內含的檔案名稱,判斷FTP連線是否需要轉換到其他的FTP server來處理,以實現負載共享與負載平衡之目的。因為HDFS具有資料保全與高擴展性的優勢,所以我們使用HDFS作為FTP server的檔案系統,HDFS支援檔案備份與容錯機制,可以確保資料的完整性,在儲存容量不足時亦可透過新增子節點來擴充檔案系統的儲存空間。此外,即使其中一台出問題無法提供服務時,FTP server叢集系統仍然還有部份的資料可供存取。 由實驗結果顯示,我們的FTP Multi-Handoff機制,在使用者需求的檔案是平均分散在不同的server之情況下,叢集系統整體之FTP處理能力 (throughput) 依FTP server的數量呈倍數成長,顯著地分散單一FTP server的負載。此外,我們量測在增加FTP Multi-Handoff機制後,LVS-CAD叢集式平台所額外增加的時間成本,由於我們實作的檔案處理指令由封包內容可分為兩種類型:包含下載和刪除,針對這兩種不同類型我們分別量測出轉送封包的時間為2932ms和2926ms,而轉換FTP server的時間分別是4.24ms和2.92ms。當使用者所需要的檔案,並非完全集中於同一台FTP server時,儘管會額外增加轉送封包與轉換FTP server的時間成本,FTP server叢集卻能給FTP server帶來分散負載的好處。

並列摘要


As the rapid growth of Internet, web services, and amount of users, many popular web sites such as Google and Yahoo! bring the network bandwidth and the server loading a great challenge. Building a computing platform which is cost effective, highly efficient, scalable, and easy to maintain is thus very important. For example, Yahoo! uses Hadoop as the cloud computing platform that includes a parallel computing architecture and the Hadoop Distributed File System (HDFS). HDFS contains mechanisms of file backup and fault tolerance to ensure data integrity and reliability. By adding the number of nodes, HDFS can increase storage capacity and computing capability. File Transfer Protocol (FTP) is a standard network protocol widely used to transfer files over the Internet. Traditionally, an FTP server is set up on a single host. When the amount of data is increasing, data security and system stability would become important issues. In order to solve the problem of bandwidth limitation for a single FTP server that is unable to handle a large number of simultaneous FTP connections, we use the LVS-CAD platform to build a FTP server cluster. Unlike a single FTP server, our experimental platform includes many FTP servers as a FTP server cluster. In this cluster, each FTP server stores different files. The FTP Multi-Handoff mechanism is proposed and implemented to hand off an existing FTP connection from an FTP server to another FTP server. To achieve load sharing and load balancing among FTP servers, this mechanism examines clients’ request packets that contain the file names and determines whether the current FTP connection needs to be handed off to another FTP server. Because HDFS has the advantages of data security and high scalability, we use HDFS as the file system of our FTP server cluster. HDFS also supports the mechanisms of file backup and fault tolerance to ensure data integrity. When the storage capacity is insufficient, the data nodes can also be added to expand the storage space of file system. Besides, even if one of FTP servers in the cluster is fail and cannot be accessed, there are still a part of the files available for clients to access. The experimental results show that the throughput of our FTP server cluster using the proposed FTP Multi-Handoff mechanism is increased as the number of FTP servers grows, when files requested by users are uniformly distributed in FTP servers. This demonstrates that our FTP server cluster can successfully distributed FTP loading among FTP servers. Besides, we measure the additional overhead for LVS-CAD cluster using FTP Multi-Handoff mechanism. Our file operating commands include two types: download and delete. The measured time for forwarding packet from client to FTP server in the cluster are 2926ms and 2932ms respectively for these two types and the measured time for switching FTP server are 4.24ms and 2.92ms respectively. When files requested by users are not all located in the same FTP server, even with these overhead, our FTP server cluster using the proposed FTP Multi-Handoff mechanism still has the benefit of distributing FTP loading among FTP servers.

參考文獻


[1]Active FTP vs. Passive FTP, a Definitive Explanation,
http://slacksite.com/other/ftp.html.
[2]Apache Hadoop, http://hadoop.apache.org/.
[3]Apache JMeter, http://jakarta.apache.org/jmeter/.
[4]Applications powered by Hadoop, http://wiki.apache.org/hadoop/PoweredBy.

延伸閱讀