在社群網路串流中之樣式探勘

近年來，由於社群網站的蓬勃發展，許多的研究都集中在從社群網路中探勘出有用的知識，目前所有的方法大都是以圖論的方法為主，即從龐大的社群資料中，找出一些頻繁或具有代表性的子圖(sub-graph)。但以圖論為基礎的方法，在找尋所有頻繁子圖時的效率非常的差，就算有利用一些加速編碼(encoding)的方式或是一些修剪策略(pruning strategy)，效能方面還是依舊無法大幅改善。現今的社群網站資料都是非常的龐大，以圖論為基礎的子圖探勘方法通常只能適用於中小型的圖形資料，無法處理大型的社群網路圖資。社群資料通常會隨著時間有所變化，會有新的節點或新的鏈結加入，相對地也會有舊的節點或鏈結被移除，圖論的方法只能從一個時間點中的靜態社群網路粹取出循序樣式，對於社群網路的串流資料，並無法從中間找出時間演進所帶來的變化，這些都是以圖論為基礎的方法為人所詬病的地方。在本論文中，我們設計一個利用傳統的循序樣式探勘技術，從社群網路串流資料中，能有效率地探勘出具有時間演進代表性循序樣式。此外，提出一個新的演算法Streaming Evolution Pattern Miner (SEPM)，有效的提升探勘效率與維護頻繁序列。SEPM還採用了一個修剪策略，有效地減少記憶體使用，最後使用虛構資料集的實驗結果表示SEPM於社群網路串流中有高效地執行效率。

關鍵字

社群網路分析；資料串流探勘；循序樣式探勘

並列摘要

In recent years, due to the growth of social website, many studies have focused on discovery useful knowledge from the social networks. Currently, nearly all methods are based heavily on graph theory method. To find some frequent or representative subgraphs from the large social data. However, the graph-based approach is very inefficient identifying frequent subgraphs. Even if some methods of accelerating encoding or pruning strategies are implemented, performance is not improved significantly. Today's social networking community information is very large. Based on graph theory the sub-graph exploration methods are usually only applicable to small and medium-sized graphics data sets. They cannot handle large-scale social network graph data. Community information continually changes over time. There will be new nodes or new edges joining the set. Similarly, old nodes or edges will need to removed. This method can only be used to represent a snapshot of the static social network of streaming data, and cannot represent the evolution of the changes brought about by these are based on the graph theory methods criticized area. In this paper, we devised a method of using traditional sequential pattern mining technology from the social network streaming data to effectively discovery the evolution of representative sequential pattern over time. In addition, a new algorithm, Streaming Evolution Pattern Miner (SEPM), is proposed to effectively improve the efficiency of exploration and to maintain frequent sequences. Additionally, SEPM uses a pruning strategy to effectively reduce the use of memory. Finally, we utilize the synthetic dataset of experimental results to display improved SEPM social network streaming efficiency.

並列關鍵字

Social Network Analysis ； Data Stream Mining ； Sequential Pattern Mining

參考文獻

[1] Mendes, L., Ding, B., and Han, J. 2008. Stream sequential pattern mining with precise error bounds. In Proceeding of the 8th IEEE International Conference on Data Mining (ICDM 2008). 941–946

[2] Allen, J. 1983. Maintaining Knowledge about Temporal Intervals. Communications of ACM 26(11) 832-843.

[10] Hopper, F. 2002. Finding informative rules in interval sequences. Intelligent Data Analysis 6(3). 237-255.

[11] Inokuchi, A., and Washio, T. 2008. A fast method to mine frequent subsequences from graph sequence data. In Proceeding of the 8th IEEE International Conference on Data Mining (ICDM’08). 303-312.

[13] Kim, M., and Han, J. 2009. A Particle-and-Density based Evolutionary Clustering Method for Dynamic Networks. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB’09).

國際替代計量

在社群網路串流中之樣式探勘

全文下載

主題瀏覽