透過您的圖書館登入
IP:216.73.216.60
  • 學位論文

圖形資料庫中封閉性頻繁圖形序列樣式之資料探勘

Mining Closed Frequent Sequential Graph Patterns in Graph Databases

指導教授 : 李瑞庭

摘要


在社群網站中,人們在某段時間中的互動關係可以用一個圖形來表示,其中,每個點代表一個人,兩點之間的邊代表人與人之間的互動關係。然而,點和邊會因為時間變化而有所不同,而這種會隨著時間改變的網路稱作動態網路。從動態網路中探勘頻繁樣式,可以幫助我們分析社群網站中的互動行為。因此,在這篇論文中,我們提出一個有效率的探勘演算法叫「SGP-Mine」,以找出在圖形資料庫中封閉性頻繁圖形序列樣式。SGP-Mine演算法主要可分成兩個階段。第一階段,我們先找出所有長度為一的頻繁樣式。在第二階段,我們以深度優先搜尋的方式遞迴產生所有的頻繁樣式。在產生的過程中,我們利用修剪策略刪除不必要的候選樣式,並用封閉性檢查移除非封閉性的樣式。因此,SGP-Mine能有效率地從圖形資料庫中找出封閉性頻繁圖形序列樣式。實驗結果顯示,不論在合成或真實資料庫中,我們所提出的方法皆優於改良式的Apriori演算法。

並列摘要


In social networks, a snapshot of interactions among users can be modeled by a graph, where a vertex stands for a user and an edge denotes an interaction between two users. The users and interactions may change over time and form a dynamic network. Mining sequential graph patterns from a dynamic network can help us analyze interaction behaviors on social networks. Therefore, in this thesis, we propose a novel algorithm, SGP-Mine (Sequential Graph Pattern-Mine), to mine the closed frequent sequential graph patterns in graph databases. The proposed algorithm consists of two phases. First, we use gSpan to find all frequent SGPatterns of length one (1-SGPatterns) from the database and record the occurrences for each 1-SGPattern found. Second, we recursively generate frequent SGPatterns in a DFS manner until no more frequent SGPatterns can be found. During the mining process, we employ three pruning strategies to prune unnecessary candidates and a closure checking scheme to eliminate non-closed patterns. Therefore, the proposed method can efficiently mine closed frequent sequential graph patterns in a graph database. The experiment results show the SGP-Mine algorithm outperforms the modified Apriori algorithm.

參考文獻


[3] C. Borgelt and M. R Berthold, Mining molecular fragments: Finding relevant substructures of molecules, Proceedings of IEEE International Conference on Data Mining, 2002, pp. 51-58.
[4] K. M. Borgwardt, H. Kriegel, and P. Wackersreuther, Pattern mining in frequent dynamic subgraphs, Proceedings of IEEE International Conference on Data Mining, 2006, pp. 818-822.
[5] N. Eagle, A. Pentland, and D. Lazer, Inferring social network structure using mobile phone data, Proceedings of the National Academy of Sciences, 2009, Vol. 106 (36), pp. 15274-15278.
[6] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman & Co, San Francisco, 1979.
[7] J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, Proceedings of ACM SIGMOD Conference on Management of Data, 2000, pp. 1-12.

延伸閱讀