透過您的圖書館登入
IP:3.135.231.36
  • 學位論文

應用於串流圖資料的分布式漸進模式匹配演算法

Distributed Incremental Pattern Matching on Streaming Graphs

指導教授 : 周志遠

摘要


隨著大數據研究的興起,提出了許多資料處理系統如Hadoop, Spark等。而在機器學習、社群網路、道路網絡流量和智慧電網等許多的應用中,資料可以被更進一步表示成「圖」的資料結構,因此許多的大數據圖處理系統架構相繼在近幾年提出,如Graphlab、Giraph等。然而,現存對於圖處理系統結合串流數據的研究依舊有限。因此,此研究中,我們的目標是開發一個用於解決圖模式匹配查詢的分佈式圖處理系統,資料圖會隨著外部讀入的串流數據而不停改變拓撲結構或屬性資料。為了實現這一目標,我們提出漸進式匹配算法上並將它實現在由Stanford大學開源的圖處理系統,GPS。這是一個以Vertex為計算單元的分佈式圖處理系統。我們改寫了GPS的架構來支援串流圖數據,並且我們更進一步採用subgraph-centric資料模型來減少網路傳輸開銷,以提升系統效能。我們使用維基百科作為效能評估的真實資料,我們的結果相較於傳統的批次做法有3 ~ 10倍的加速,並顯著降低網路流量和記憶體空間的使用情況。

並列摘要


Big data has shifted the computing paradigm of data analysis. While some of the data can be treated as simple texts or independent data records, many other applications have data with structural patterns which are modeled as a graph, such as social media, road network traffic and smart grid, etc. However, there is still limited amount of work has been done to address the velocity problem of graph processing. In this work, we aim to develop a distributed processing system for solving pattern matching queries on streaming graphs where graphs evolve over time upon the arrives of streaming graph update events. To achieve the goal, we proposed an incremental pattern matching algorithm and implemented it on GPS, a vertex centric distributed graph computing framework. We also extended the GPS framework to support streaming graph, and adapted a subgraph-centric data model to further reduce communication overhead and system performance. Our evaluation using real wiki trace shows that our approach achieves a 3x ~ 10x speedup over the batch algorithm, and significantly reduces network and memory usage.

參考文獻


[3] Nmon - nigel’s performance monitor. http://nmon.sourceforge.net/ pmwiki.php.
[9] Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., and Wu, Y. Graph pat- tern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment 3, 1-2 (2010), 264–275. 37
[10] Fan, W., Wang, X., Wu, Y., and Deng, D. Distributed graph simulation: Impossibility and possibility. Proceedings of the VLDB Endowment 7, 12 (2014), 1083–1094.
[12] Garey, M. R., and Johnson, D. S. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1990.
[13] Godskesen, J. C., and Nanz, S. Mobility models and behavioural equiv- alence for wireless networks. In Coordination Models and Languages (2009), Springer, pp. 106–122.

延伸閱讀