頻繁同構圖形探勘策略之研究

由於在頻繁項目集合（Frequent Itemsets）和序列型樣（Sequential Patterns）的探勘技術日趨成熟，很自然的，我們會想再進一步探討另一種包涵更廣泛資料關聯性的型樣探勘（Pattern Mining）－圖形探勘（Graph Mining）。圖形探勘的應用非常廣泛，較著名的應用領域像是化學（Chemistry）、生物學（Biology）和電腦網路方面（Computer Network），以及其它所有可以對應成圖形型樣（Graph Pattern）的實際資料，在這些領域都會需要圖形型樣的探勘技術來支援其資料的分析與預測。圖形探勘的主要挑戰在於如何解決子/圖形同構（Subgraph/ Graph Isomorphism）問題，在本篇論文中我們提出一個結合圖形標準型態（Canonical Form）和資料內嵌結構的演算法，針對圖形資料庫（Graph Databases）進行高效率探勘。其主要概念為利用圖形標準型態解決重覆列舉問題，以及有技巧的記錄圖形型樣在資料庫中的位置（Embedding List），完全避免子圖形同構的檢查問題。實驗顯示我們所提出的演算法無論在合成資料與實際資料中，探勘效率都會勝過gSpan。

關鍵字

圖形探勘；型樣探勘；圖形同構

並列摘要

As the mining of frequent itemsets and sequential patterns became more mature, it is very natural that we would want to explore other patterns such as graph structures. Graph mining has very wide applications, such as chemistry, biology and computer networks. The main challenge in graph mining is how to solve the graph/ subgraph isomorphism problems. Thus, we propose an algorithm that combined previous pattern mining skills and some graph mining techniques to mine all frequent subgraph patterns efficiently. Our algorithm adopts canonical form to avoid the duplicate enumeration, and used an effective embedding list structure to avert the subgraph isomorphism checking completely. Our empirical study on synthetic and real datasets demonstrates that HybridGMiner achieves a substantial performance gain over the algorithm gSpan.

並列關鍵字

pattern mining ； graph isomorphism ； graph structures ； graph mining

參考文獻

［3］ C. Borgelt, M.R. Berthold. Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In Proceedings of the International Conference on Data Mining (ICDM), pages 51-58, 2002.

［8］ K. Y. Huang, C. H. Chang and K. Z. Lin, PROWL: An efficient frequent continuity mining algorithm on event sequences. In Proc. of 6th International Conference on Data Warehousing and Knowledge Discovery (DaWak), 2004.

［9］ J. Huan, W. Wang, J. Prins. "Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism", in Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), 2003.

［12］ M. Kuramochi, G. Karypis. Frequent Subgraph Discovery. Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’02), pages 721-724, 2002.

［17］ K. Shearer, H. bunks, S. Venkatesh. Video Indexing and Similarity Retrieval by Largest Common Subgraph Detection using Decision Trees. Pattern Recognition 34 (2001) 1075—1091.

國際替代計量

頻繁同構圖形探勘策略之研究

未授權

主題瀏覽