異質Hadoop下的平行度利用

隨著巨量資料的興起，Apache Hadoop也逐漸受到關注。Apache Hadoop有兩個重要的核心：Hadoop Distributed File System與MapReduce架構。MapReduce是一個分散式運算的計算模型。然而，MapReduce還不夠有效率。MapReduce的實作並沒有充分利用平行度進行平行處理達到加速，反而採用循序化的方法來運算。為了解決這個問題，本論文提出了一個新的Hadoop架構，充分利用平行度來進行平行處理運算。為了達到更好的效能，我們也利用GPU的運算能力來加速整個程式。除此之外，為了充分利用CPU與GPU來減少執行時間，我們也提出了一個排程方法來動態的分配運算在適當的資源上。我們的實驗結果顯示出我們提出的系統與Hadoop相比加快了1.45倍。

關鍵字

巨量資料；異質系統

並列摘要

With the rise of big data, Apache Hadoop had been attracting increasing attention. There are two primary components at the core of Apache Hadoop: Hadoop Distributed File System(HDFS) and MapReduce framework. MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm on a cluster. However, MapReduce framework is not efficient enough. With the parallelism of mapper, he implementation of Hadoop MapReduce does not fully exploit the parallelism to enhance performance. The implementation of mapper adopts serial processing algorithm instead of parallel processing algorithm. To solve these problems, this thesis proposed a new Hadoop framework which fully exploit parallelism by parallel processing. For better performance, we utilize GPGPU’s computational power to accelerate the program. Besides, in order to utilize both CPU and GPU to reduce the overall execution time, we also propose a scheduling policy to dynamically dispatch the computation on the appropriate device. Our experimental results show that our system can achieve a speedup of 1.45X on the benchmarks over Hadoop.

並列關鍵字

Big Data ； Heterogeneous System

參考文獻

[1] CSC Report. Available:

Applications, Vol. 57, no. 9, Nov 2012.

[6] S. Ghemawat, H. Gobioff, and S.T. Leung, “The Google File System”, in 19th

Symposium on Operating Systems Principles, pages 29-43, Lake George, New

[7] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large

國際替代計量

異質Hadoop下的平行度利用

全文下載

主題瀏覽