BiFennel:針對大數據的快速二部圖劃分算法

在雲端計算被廣泛應用的今日，社交網絡分析、生物信息網絡、語義分析類的諸多應用對於快速處理億萬節點圖的能力有著迫切的需求，這也是分佈式高性能計算領域的研發重點之一。而許多諸如音樂、電影網絡的推薦系統，LDA主題模型等問題都可以通過二部圖的建模、計算進行解決。而圖劃分作為圖計算預處理最重要的環節，雖然已經是一個很成熟的技術，然而許多經典算法因為需要迭代多次，時間複雜度過大，因此應用到大規模圖劃分時時間過長。近年來也湧現出一些速度較快的圖劃分算法，然而這些算法并沒有針對二部圖進行針對性的優化。本文提出的BiFennel算法是一種新二部圖劃分算法，通過降低節點複製率，平衡圖數據負載的方式，進而降低迭代圖計算部分時間以及網路負載。最後，本文將BiFennel算法應用于圖計算平台PowerGraph [10]。實驗證明，相比於目前最好的Aweto算法，使用BiFennel後，圖引擎整體執行時間最高降低1.69倍，網路負載減少最高達到2.32倍。另外，當圖規模增大或是集群中節點數增多時，BiFennel的性能並沒有降低。

關鍵字

二部圖；圖劃分；圖計算； PowerGraph

並列摘要

Cloud computing is widely utilized in today’s internet service, which severely requires the ability of processing graphs of billion vertices rapidly in the situation such as social network analyze, bio-informational network analyze and semantic processing. Therefore, graph processing has a significant role in the research and development of high-performance computing. However, many problems such as music and movie recommendation web, LDA topic model can be solved by modeling these data into bipartite graph and computing it with graph processing engines. As the most important step in preprocessing, graph partitioning is a relatively mature technology. However, most of classic graph partitioning algorithms require iterative calculation for several times, which causes huge time complexity especially adapting to big data. There are some rapid algorithms these years, which cannot be used in bipartite graph directly. Therefore, this thesis proposed a new bipartite graph partitioning algorithm, BiFennel, which effectively decrease graph processing time and network loading by reducing vertex replication factor and maintaining work balance. Afterwards, this thesis applies BiFennel to a popular graph engine called PowerGraph and proves it gets up to 1.69 times overall runtime speedup of graph processing and 2.32 times network load reduction comparing to the best competitor, Aweto. Besides, the algorithm doesn’t meet a degradation when graph scale increases, so does its scalability.

並列關鍵字

Bipartite graph ； Graph partitioning ； Graph processing ； PowerGraph

參考文獻

1. Sergey B., Larry P. "The anatomy of a large-scale hypertextual web search engine. " Computer Networks and ISDN Systems, 30 (1998): 5-20.

2. Yu, Gu. "Large scale graph data processing on cloud computing environments." Chinese Journal of Computers, 34.10 (2010): 1753-1765.

3. Malewicz G., Austern M. H., et al. "Pregel: a system for large-scale graph processing. " In SIGMOD 10(2010): 135–146.  

6. Yucheng L., Bickson D., et al. "Distributed GraphLab: a framework for machine learning and data mining in the cloud." Proceedings of the VLDB Endowment 5.8 (2012): 716-727.

8. Tsourakakis C. E., Gkantsidis C., et al. "Fennel: Streaming graph partitioning for massive scale graphs. " Tech. Rep. 175918, Microsoft, 2012.

國際替代計量

BiFennel:針對大數據的快速二部圖劃分算法

主題瀏覽