透過您的圖書館登入
IP:18.226.93.207
  • 期刊
  • OpenAccess

Improving MapReduce Performance by Exploiting Input Redundancy

並列摘要


The proliferation of data parallel programming on large clusters has set a new research avenue: accommodating numerous types of data-intensive applications with a feasible plan. Behind the many research efforts, we can observe that there exists a nontrivial amount of redundant I/O in the execution of data-intensive applications. This redundancy problem arises as an emerging issue in the recent literature because even the locality-aware scheduling policy in a MapReduce framework is not effective in a cluster environment where storage nodes cannot provide a computation service. In this article, we introduce SplitCache for improving the performance of data-intensive OLAP-style applications by reducing redundant I/O in a MapReduce framework. The key strategy to achieve the goal is to eliminate such I/O redundancy especially when different applications read common input data within an overlapped time period; SplitCache caches the first input stream in the computing nodes and reuses them for future demands. We also design a cache-aware task scheduler that plays an important role in achieving high cache utilization. In execution of the TPC-H benchmark, we achieved 64.3% faster execution and 83.48% reduction in network traffic in average.

被引用紀錄


Lin, C. T. (2014). 去重複資料之相變化記憶體儲存系統空間管理 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2014.02703

延伸閱讀