透過您的圖書館登入
IP:18.222.155.58
  • 期刊
  • OpenAccess

Improving MapReduce Performance by Exploiting Input Redundancy

並列摘要


The proliferation of data parallel programming on large clusters has set a new research avenue: accommodating numerous types of data-intensive applications with a feasible plan. Behind the many research efforts, we can observe that there exists a nontrivial amount of redundant I/O in the execution of data-intensive applications. This redundancy problem arises as an emerging issue in the recent literature because even the locality- aware scheduling policy in a MapReduce framework is not effective in a cluster environment where storage nodes cannot provide a computation service. In this article, we introduce SplitCache for improving the performance of data-intensive OLAP-style applications by reducing redundant I/O in a MapReduce framework. The key strategy to achieve the goal is to eliminate such I/O redundancy especially when different applications read common input data within an overlapped time period; SplitCache caches the first input stream in the computing nodes and reuses them for future demands. We also design a cache-aware task scheduler that plays an important role in achieving high cache utilization. In execution of the TPC-H benchmark, we achieved 64.3% faster execution and 83.48% reduction in network traffic in average.

被引用紀錄


彭芷瑜(2013)。咖啡酸苯乙酯抑制口腔癌細胞轉移和侵襲研究〔博士論文,中山醫學大學〕。華藝線上圖書館。https://doi.org/10.6834/CSMU.2013.00016
許敬林(2015)。設計改善分享式快取競爭及維護公平性的多核心排程機制〔碩士論文,逢甲大學〕。華藝線上圖書館。https://doi.org/10.6341/fcu.M0103468

延伸閱讀