透過您的圖書館登入
IP:13.58.60.192
  • 學位論文

基於MapReduce的大型網路探勘之應用與評估

Exploiting and Evaluating MapReduce for Large-scale Graph Mining

指導教授 : 林守德

摘要


大型網路探勘的目標是針對網路結構以及網路中隱藏的樣式進行深入了解。在處理大型網路時,巨大的時間與空間複雜度將會影響分析的效率。目前雲端運算正被使用於解決大型網路分析的效率問題,其中MapReduce是目前最流行的一種雲端運算的運算架構。在本研究中,我們提出一個基於MapReduce的運算架構來解決大型網路分析的效率問題,此架構稱之為MapReduceGraphMiningFramework(MGMF),包含基本函式、MapReduce演算法及最佳化方法。我們將網路探勘演算法分成四大類,並基於此架構來針對各種不同的網路探勘演算法進行有效率而且完整的設計並實作。 PEGASUS是當前最新的一個基於MapReduce的網路分析工具,MGMF針對PageRank的實作比PEGASUS快了3至20倍的執行時間,而MGMF提供的架構也更有效率且支援更多樣的演算法,實驗進行於包含超過十億個聯結的大型真實世界網路上,實驗證明運算架構MGMF兼具效率及可擴展性。除此之外,我們還對單一機器的運算能力進行檢驗,進一步探討適合使用雲端運算的時機,我們實作的MGMF作為開放原始碼工具開放給大眾使用,函式庫及測試資料網址為http://mslab.csie.ntu.edu.tw/~noahsark/MGMF/。

並列摘要


Graph mining is a popular technique for discovering the hidden structure or important instances from a graph. However, when dealing with large-scale graphs with billions of entities, it generally draws concerns about the effi- ciency in computation. On the other hand, cloud computing is often resorted to as a feasible solution to ease the computational burden. In this work, we present the MapReduce Graph Mining Framework (MGMF), an open source graph mining library aims at providing a robust and efficient MapReduce- based graph mining tool. We categorize graph mining algorithms into four categories and design different MapReduce frameworks for each. Comparing our work to PEGASUS, which is the state-of-the-art library for graph mining on MapReduce, the experiment results show our package is 3 to 20 times more efficient than PEGASUS, and we have more com- plete coverage on different graph mining algorithms. We also validate our framework on billion-scale networks to show that it is scalable to the num- ber of machines. Finally, we explore the capability of a single machine on solving large-scale graph mining problems to provide more insight on the usage condition of a cloud computing machine on graph-mining algorithms. We have made our implementation an open-source library downloadable in http://mslab.csie.ntu.edu.tw/ ̃noahsark/MGMF/.

參考文獻


[2] U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2):163–177, 2001.
[3] B. Cai and W. Xue. X-rime: Hadoop based large scale social network analysis, 2009. Available at http://xrime.sourceforge.net/.
[4] R. Chen, X. Weng, B. He, and M. Yang. Large Graph Processing in the Cloud. SIGMOD (demo), 2010.
[6] J. Cohen. Graph twiddling in a mapreduce world. Computing in Science and Engineering, 11(4):29–41, 2009.
[7] T. Cormen. Introduction to algorithms. The MIT press, 2001.

延伸閱讀