G-Storm: 具 GPU 感知之 Storm 規劃方法

現今我們正邁向資料經濟的時代，如何能有效分析大量數據則成為成功的關鍵。目前有許多用於處理巨量資料的系統已經被開發出來，當中Storm是為了處理資料串流而設計的。Storm預設只使用了相當簡易的round-robin策略來對工作進行排程。這種策略在同質平台的環境下可以達到不錯的成效，但是在異質環境下則無法達到有效的利用。此篇論文我們設計並實作出G-Storm，一種新的Storm排程演算法，能讓Storm有效地評估並利用GPU計算卡來加速計算效能。我們的實驗顯示G-Storm在工作量較輕的情況下可以比Storm預設的工作排程多出1.65倍的效能，而在工作量較重的情況下更可達到將近2.04倍的加速。

關鍵字

大數據；串流處理； GPU ； Storm

並列摘要

Now we are shifting toward to a data driven economy, in which the ability to efficiently analyze huge amount of data in time is the key to successes. Many systems for big data processing have been developed and Storm is one of them, whose target is stream data processing. By default Storm only provides a very simple round robin scheduling policy to assign tasks. The default scheduler can provides nice performance for homogeneous platforms, but does not work well for heterogeneous computing environments. In this thesis, we propose and implement a new Storm scheduling algorithm, named G-Storm, such that Storm can evaluate GPU capacity for scheduling and more effectively make use of GPU to speed up the overall performance. The experimental results show that G-Storm can achieve 1.65x to 2.04x performance acceleration on lightly weight and heavily loading of topology, compared to Storm with default scheduler.

並列關鍵字

big data ； stream process ； GPU ； Storm

參考文獻

Gang Chen 0001, Ke Chen 0005, Dawei Jiang, Beng Chin Ooi, Lei Shi, Hoang Tam Vo, and Sai Wu. E3: an elastic execution engine for scalable data processing.

Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. Adaptive online scheduling in storm.

Apache Software Foundation. Storm. http://storm.apache.org.

Vinayak Borkar, Michael Carey, Raman Grover, Nicola Onose, and Rares Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing.

M. Cammert, C. Heinz, J. Kramer, B. Seeger, S. Vaupel, and U. Wolske. Flexible multi-threaded scheduling for continuous queries over data streams.

國際替代計量

G-Storm: 具 GPU 感知之 Storm 規劃方法

主題瀏覽