透過您的圖書館登入
IP:52.15.128.243
  • 學位論文

Hadoop系統參數優化

Optimization of Hadoop System Configuration Parameters

指導教授 : 廖世偉
本文將於2025/06/29開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


在當前big data的時代,Hadoop系統對於分析和應用大數據有著至關重要的作用,我們既希望能夠把Hadoop系統參數能夠調節到最佳的狀態又希望能夠在不花費更多在硬體的更新上。因此我的碩論的主題選擇在Hadoop系統參數的優化,在這裡主要針對希望優化的效能是在於減少單一任務的執行時間。我採用的是三段式模型: (1)是在眾多參數中找到對於系統影響最大的參數,根據map和reduce分開觀察並選出20個參數作為我們主要要調節的參數; (2)是建立系統時間的預測模型,根據這20個參數去搜集更多的任務執行的時間和相對應的參數作為我們建立模型的基礎,運用機器學習的方法去做建模並且選擇出最適合的三層式模型; (3)是建立系統的優化模型,每次優化機會在設定的參數範圍內隨機選取出來參數,並且把它放到之前建好的預測的模型去預測其執行的時間,經過我設定好的優化模型最終會找到一個執行時間最短的參數組合。我總共選擇了4個程式,經過以上的方法組合去驗證。

關鍵字

系統 優化

並列摘要


Hadoop system is very popular recent year, which is a software framework with distributed processing large-scale data-sets by using a cluster of machines with MapReduce programming model. However, there are still two essential challenges for Hadoop users to manage the Hadoop system. (1) To tune the parameters appropriately; (2) To deal with dozens of configuration parameters which are involved to its performance. This paper will focus on optimizing the Hadoop MapReduce job performance. Our approach has two key model: Prediction and Optimization. The Prediction model is to estimate execution time of a MapReduce job and the Optimization model is to search the approximately optimal configuration parameters by invoking the prediction part repeatedly. By using an analytical method to choose approximately optimal configuration parameters to improve users’ job performance . Besides the configuration parameter tuning, the relevance of each parameters and the evaluation of our methods will also be discussed in this paper. Our paper may provide users a better method to improve the Hadoop system performance and save the hardware resource.

並列關鍵字

tuning optimization predictor

參考文獻


[3] Mape. http://en.wikipedia.org/wiki/Mean_absolute_percentage_error.
[4] S. G. R. S. Alexander Zien, Nicole Kramer. The feature importance ranking measure. arXiv:0906.4258v1, 2009.
[5] S. B.A.Kitchenham, L.M.Pickard and M.J.Shepperd. What accuracy statistics really measure.
[6] J. Bennett and S. Lanning. The netflix prize. in Proceedings of KDD cup and workshop, page 35, 2007.
[8] H. A. Carneiro and E. Mylonakis. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clinical infectious diseases, 49:1557–1564, 2009.

延伸閱讀


國際替代計量