透過您的圖書館登入
IP:3.138.141.202
  • 學位論文

運用階層式普瓦松模型進行無向單源圖上的社群偵測

A Hierarchical Poisson Model for Community Detection of Undirected Single-edge Graphs

指導教授 : 盧鴻興

摘要


現今圖形資料 (Graph data) 已然成為資料科學研究中最重要資料類型 之一,處理大數據資料更是另一項挑戰。許多文獻藉由這群偵測作為分析 資料的首要步驟。本文提出的普瓦松分群模型以不同的角度進行社群偵測, 我們假設原始資料的連結可以是多邊連結的 (Multi-edged),儘管實際上我 們經常只能取得到二元的資料 (Binary, single-edged),但我們仍可透過模擬 的方式取得被忽略的權重的估計值,再以估計出來的網絡資料進行分群。 再者我們假設資料裡的每一個節點都有一組特徵參數,代表著與他人發生 連接傾向,利用 Ball et al. 所提出的技巧,我們可以快速地估計出模型裡的 參數,因此我們的模型可以說是他們的一般化模型。參數估計的方法為條 件最大期望演算法 (Conditional EM Algorithm)。另外,我們使用修正的赤 池信息量準則 (AICc) 作為模型選擇的標準,用以找尋最適當的分群群數。 我們將次論文提出的方法使用在合成和實際資料上,分群結果有效而且快 速。整體演算法的計算複雜度為 O(N2K),然而和其他最佳化演算法相比, 我們使用的最大期望演算法所需的收斂次數相對較少,因此此方法有潛力 被應用在較大的資料上。

關鍵字

圖形資料 社群偵測

並列摘要


As graph data gain great popularity in the last decade, network analysis becomes an important research work. What’s more, in the big data era, han- dling large graph is the next challenge. Many researches have been working on exploring graph data starting by community detection. Poisson Model provides a di erent view of point to carry out the detection by assuming the raw networks are multi-edged. Although some weight information is missing and only single edges are observed, we design a mechanism to estimate the weight. Then we assume each node has a feature of propensity to connect to other nodes and take advantage of the fast optimization technics of Ball et al. for parameter estimation. In a sense, our model can be regarded as a generalized Ball et al.’s model. Conditional EM algorithm is applied to carry out the estimation. Next, AICc is served as our model selection criteria for choosing number of groups. The computational complexity is O(N2K). Ac- cording to the results of synthesized and real data, our method is e ective and fast. Compared to other optimization algorithm, the required number of iteration of EM algorithm is relatively fewer, therefore having a potential to be applied to large graphs.

並列關鍵字

Graph Data Community Detection

參考文獻


[1] Krzysztof Nowicki and Tom A B Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 2001.
[3] Pierre Latouche, Etienne Birmelé, and Christophe Ambroise. Overlapping stochastic block models with application to the french political blogosphere. The Annals of Applied Statistics, 2011.
[4] Guan-Chun Chen and Henry Horng-Shing Lu. A factor model for graph data. Master Thesis of National Chiao Tung University, 2015.
[5] Brian Ball, Brian Karrer, and Mark EJ Newman. An e cient and principled method for detecting communities in networks. Physical Review E, 2011.
[6] Cli ord M Hurvich and Chih-Ling Tsai. Regression and time series model selection in small samples. Biometrika, 1989.

延伸閱讀