在這篇論文裡,我們提出了一個以相關係數為基礎的演算法,可以用以偵測大型網路中互相重疊的社群結構 (overlapping community structure)。首先,對於一個由節點形成的集合,我們定義此集合的自我相關係數 (self correlation) 做為描述此集合重要性的依據,接著為了描述此集合中每一個節點與同一集合之其它節點的相關程度,我們定義了此集合的相關強度 (correlation intensity)。在我們的演算法中,最主要的想法是讓一個節點形成的集合漸漸地增大(透過不斷地加入新的節點),在增大的過程中盡可能地將此集合的自我相關係數最大化,同時在這過程中此集合的相關強度也要維持在一個門檻值以上,此門檻值可以由一個複雜網路的廣義分支度分佈 (generalized node degrees) 設計出來,並且會使得我們的演算法所產生出來的每個節點集合具有一定的特性。在模擬方面,我們用電腦產生出具有內建的互相重疊之社群結構的網路,並且將我們的演算法應用在這些網路,透過大量的測試,結果顯示我們的演算法有很好的表現。此外,我們也將我們的演算法應用在一個真實世界的網路「空手道社」上,在這個網路裡,我們的演算法所偵測到的社群結構非常接近這個網路已知的社群結構。
In this thesis, we propose a correlation-based algorithm for detecting overlapping communities in networks (graphs) based on a general probabilistic framework introduced in [20].For a set of nodes in a graph, we first define the self correlation of the set for measuring its importance, and then we define the correlation intensity of the set for describing how strongly each node in the set is correlated to the other nodes in the set. The key idea in our correlation-based algorithm is to locally maximize the self correlations of sets of nodes in a greedy manner while maintaining the correlation intensities of those sets to be above a given threshold. Given the generalized node degrees of a graph, the threshold can be chosen so that every set of nodes generated by our algorithm possesses certain properties. Through extensive computer simulations of random graphs with built-in overlapping community structure, we show that the performance of our algorithm is quite good. Furthermore, we apply our algorithm to the real-world network “Karate club” and show that the overlapping communities detected by our algorithm are very close to the known communities in this graph.