大型網路中以核心為基礎之社群偵測

隨著科技的發展，可搜集到的資料量驟增。與之所對應的網路變得大型與複雜。當我們社群偵測大型網路時，大多數現有的方法已不太適用。其原因是，大部份的演算法在處理網路時，每一階段都需查找網路中所有節點，但在大型網路是難以處理的，因為時間複雜度與計算複雜度都是急劇成長。根據上述原因，我們發展一個在大型網路的社群偵測演算法。在我們的論文中，我們定義一個子集合可以代表該集合的核心。在大型網路中實作社群偵測時，我們可以忽略集合並專注於代表該集合的核心。我們提出一個以核心為基礎的區域社群演算法，來證實集合的核心可以代表該集合，並且通過使用LFR基準圖形和DBLP網路（以作者為中心的論文合作網路）測試演算法。其演算法在社群強度大的LFR基準圖形表現良好，可以達到幾乎100%的查準率(Precision)和查全率(Recall)。然而，在DBLP網路則表現不佳，我們探討其原因是DBLP網路有重疊社群(overlapping communities)。最後我們發展一個在大型網路中以核心為基礎的社群偵測演算法，此演算法不需要在每一個階段都查找網路中所有節點。通過LFR基準網路和DBLP網路，我們找出一個適當的核心參數，並且和其他大型網路社群偵測演算法比較分群評價優劣。

關鍵字

社群偵測；巨量資料；大型網路

並列摘要

With the technological development, the volume of collected data has been increased very rapidly. The corresponding graphs become much larger and more complex. When we process the large-scale networks, the most existing methods [1] do not work. The reason is that most of the algorithms need to trace the whole network for each step, but the network is too huge to handle. Both the time complexity and the computational complexity have grown up rapidly. For this reason, we are interested in developing a community detection algorithm for solving the large-scale networks without tracing the whole network for each step. In our thesis, we define the core of a set that can be used to represent the set. In large-scale networks we can ignore the set S and focus on the core of the set S during the process of community detection. We propose an algorithm, called the core-based local community detection algorithm, to verify that the core of a set can represent the set, and test the algorithm by using the LFR benchmark graphs and the DBLP co- authorship network. The core-based local community detection algorithm performs well in the LFR benchmark graphs. For communities with strong community strength in the LFR benchmark graphs, this approach could reach almost 100% precision and 100% recall. However, the core-based local community detection algorithm does not performs well when the communities of the networks have overlapping communities. Finally we develop a core-based community detection algorithm for large-scale net- works, which needs not trace the whole network for each step, and also applies the algorithm to the LFR benchmark graphs and the DBLP co-authorship network. We also compare with three different methods, the label propagation, the fast unfolding and the greedy optimization of modularity, respectively.

並列關鍵字

community detection ； big data ； large-scale networks

參考文獻

[1] Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75–174, 2010.

[3] Mason A Porter, Jukka-Pekka Onnela, and Peter J Mucha. Communities in net- works. Notices of the AMS, 56(9):1082–1097, 2009.

[4] Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.

[7] Jordi Duch and Alex Arenas. Community detection in complex networks using extremal optimization. Physical review E, 72(2):027104, 2005.

[8] Usha Nandini Raghavan, R ́eka Albert, and Soundar Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106, 2007.

國際替代計量

大型網路中以核心為基礎之社群偵測

主題瀏覽