透過您的圖書館登入
IP:3.139.104.214
  • 學位論文

支援感測器資料儲存與網頁服務之具擴展性的虛擬化伺服器叢集

A Scalable VM-based Server Cluster for Supporting Sensor Data Storage and Web Services

指導教授 : 姜美玲

摘要


隨著物聯網(Internet of Things, IoT)技術的快速發展下,生活中大量地出現物聯網的裝置與相關的應用。舉例而言,因為空氣汙染的問題日趨嚴重,民眾愈來愈關注空氣品質的相關指標,因此政府與企業組織皆大量地佈署空氣品質感測器來監測空氣品質,佈署的感測器愈多,所傳送的資料也愈大。而隨著時間的推移,資料量不斷的增長下,需要更多的儲存空間,搜尋與處理資料也需要更多的處理時間,使得管理者需要擴增伺服器來處理與儲存這些感測器所發送的資料以維持提供服務的效能。 在本論文中,我們利用 Linux Virtual Server (LVS)伺服器叢集與虛擬化技術來建置一個虛擬伺服器叢集,此一叢集包含了執行負載平衡機制與轉發來自 Client Request 的 Front-end Server,以及數台實際提供服務的 Back-end Server。為了建立系統容錯的能力,我們另外建置一台 Front-end Server 的備援伺服器,而在數台實體伺服器上,我們皆建立了數台虛擬機器來作為 Back-end Server,提供收集感測器資料的服務或網頁服務。在資料儲存方面,我們使用 Hadoop 分散式檔案系統(Hadoop Distributed File System, HDFS)與分散式非關聯式資料庫 HBase 來處理與儲存這些大量的感測器資料,利用它們提供的高效能與資料的容錯特性,以確保資料不會遺失。 此外,針對在多台實體伺服器上建置提供多種不同性質服務的虛擬伺服器叢集,我們設計與實作了新的負載平衡演算法,為使虛擬伺服器叢集達到較佳的負載均衡,將各個實體伺服器上所有提供服務的 Back-end Server 的負載加入考量。因為在傳統 LVS 所提供的負載平衡演算法中,在計算伺服器的負載時,只有考慮到單一種服務,所以只有將負責同一服務的 Back-end Server 的連線數納入計算,並無考量其他提供不同性質服務的 Back-end Server 的連線數,會導致實體伺服器群的負載不均。而在我們所提的負載平衡演算法中,Front-end Server 在轉發網頁服務類型的 Request 時,會將位於相同實體伺服器上提供不同服務的 Back-end Server的目前負載情況加入考量,可以使伺服器叢集達到較佳的負載均衡。 我們針對所實作的虛擬化伺服器叢集進行了多種的測試,實驗結果證明我們的系統具有容錯、高擴展性、高可靠性與高效能的優點,而使用多台成本較低的個人電腦比單台性能較好的伺服器所建立的虛擬伺服器叢集有更佳的效能。另一方面,實驗結果也證明對虛擬化的伺服器叢集而言,在多台實體伺服器上建置提供多種不同性質服務的虛擬機器時,我們所提出的負載平衡演算法因為有將相同實體伺服器上提供不同服務 Back-end Server 的目前負載情況加入考量,能夠避免實體伺服器的負載不均,因此有效地提升整體伺服器的效能。

並列摘要


As the rapid development of Internet of Things (IoT) technology, a variety of IoT devices are deployed and diversified related applications are developed in our daily life. For example, people in Taiwan nowadays pay more attention to air quality related indicators because the air pollution issue is getting serious these days. Therefore, government and business organizations deploy a large number of sensors to detect the air quality and the amount of sensing data are huge with the growing number of sensors deployed. As the time goes by, the requirement of storage space increases as well. Servers also need more time to process these increasing amount of data. Administrators then need to extend the server capability to process and store these data from all the sensors to maintain the performance of providing services. In this study, we utilize Linux Virtual Server (LVS) and virtualization technology to deploy a virtual machine (VM) cluster named VMC/H. It consists of front-end servers which execute the designate load distribution algorithm to dispatch all requests from clients to a set of back-end servers which process the requests actually. To achieve high availability, a standby front-end server which serves as the backup is built. On each host, several VMs are built as back-end servers, which not only collect the data from sensors but provide web service. To prevent data loss, we also utilize Hadoop Distributed File System (HDFS) and HBase to process and store huge amount of data from sensors, mainly to take advantage of their high performance and high availability features. In addition, we also design and implement a new load distribution algorithm for VM-based server clusters providing multiple types of services. To achieve load balance among physical machines, the load distribution algorithm should consider the aggregate loading of all back-end servers on each host. Nevertheless, the load distribution algorithms originally developed for LVS cluster only consider the load of back-end servers that provide the corresponding service, without considering the load of back-end servers serving other types of services. This leads to the load imbalance among the entire system. Therefore, the original load distribution algorithms need to be redesigned and adapted to the proposed VM cluster. In our proposed load distribution algorithm, it considers the load of all back-end servers though providing different services on each host to achieve better load balance among the hosts. We have performed several experiments on the proposed VMC/H cluster. Experimental results demonstrate the proposed system has the advantages of fault tolerance, high scalability, high availability, and high performance. We also evaluate the performance of two VMC/H clusters, in which one is constructed on multiple physical machines whereas the other is on a single physical machine. The results show the performance of VMC/H cluster constructed on multiple physical machine with lower capability can obtain better system performance. On the other hand, experimental results also show the proposed load distribution algorithm considering the load of all back-end servers providing services on each host outperform the original LVS load distribution algorithms. By maintaining better load balance among the hosts, it effectively improves the cluster performance.

參考文獻


[1] Alluxio, https://www.alluxio.org/, accessed on Nov. 24, 2017.
[2] AMD Virtualization (AMD-V) Technology, http://sites.amd.com/us/business/it-
solutions/virtualization/Pages/amd-v.aspx, accessed on Nov. 24, 2017.
[3] Arduino, https://www.arduino.cc/, accessed on Nov. 24, 2017.
[4] ASUS TS500-E8-PS4 Server, https://www.asus.com/Commercial-Servers-Workstations/TS500E8PS4/, accessed on Nov. 24, 2017.

延伸閱讀