異質性社群網路取樣

社群網路分析近年來因為社群網路的崛起而成為一個相當熱門的領域，一些網路上有名的社群網路，例如Plurk、Twitter以及Facebook每天吸引上百萬人次造訪。這些人與人之間的互動資訊，提供了社群網路分析研究者寶貴的資料來源，但是，並不是每個商業性社群網站都將自己的客戶資料公諸於世，有時工程師需要寫程式一筆一筆的將資料從網路上擷取下來，為了在有限的資源下擷取到最多、最具代表性的資料，網路取樣成為社群網路分析不可或缺的一項技術。到目前為止，許多社群網路的性質被發現及應用在比較不同社群網路間的差異性，但是這些網路性質往往都是根據同質性社群網路所設計，然而，真實世界中充滿著異質性社群網路，人與人之間的關係有著各種不一樣的形式，如何在異質性社群網路上做取樣，是一個還未被深入探討的問題。在這篇論文中，我們提出了一個社群網路性質「Relational Profile」，它能夠表現出異質性社群網路中形別之間的關聯性，我們並運用這個性質，提出了一個新的網路取樣方法「Relational Profile Sampling」，實驗結果證明此方法能夠有效的保存異質性社群網路的Relational Profile。為了更進一步證明我們提出的Relational Profile以及Relational Profile Sampling能夠有效的使用在社群網路分析，我們利用機器學習的方法在社群網路中實作物體形別預測。實驗結果發現，Relational Profile能夠在資訊短缺或不足的情況下，準確地預測出物體的形別。在不同的取樣方法中，Relational Profile Sampling所產生出的子網路，也較其他取樣方法產生出的子網路更能準確地預測出物體的形別。

關鍵字

社群網路分析；異質性網路；網路取樣；形態預測

並列摘要

Social network analysis has been a hot topic in the last few years. Due to the rise of social network sites such as Twitter, Plurk, and Facebook, large amount of data is available for the research world. However, not all social network sites make their full customer database available to general public. Often times, engineers need to write crawlers to crawl websites just to obtain parts of the social network. Data Sampling has been widely used to extract a subset of social network to represent the larger network. Various network properties have been proposed to measure the similarity between the sampled sub-networks with the original network. However, some of the properties only work for homogenous networks, in which nodes and edges are treated the same. In this thesis, we propose a novel network property, the Relational Profile to model the transitional probability between node and link types in a heterogeneous social network, networks which nodes and edges have different types. We propose a novel sampling by exploration method with the goal to sample a sub-network whose Relational Profile is as close to the Relational Profile of the original network as possible. The experiment result shows that our sampling method produces a more representative sub-network with less sampled nodes and edges. Then we try to solve a real world problem, node type prediction, using Machine Learning method with sampled sub-network as training data. Experiment shows that using Relational Profile as features works better than other features, such as in and out degree, as Relational Profile is more resistant to neighbors of testing nodes missing. Also, with the same amount of nodes sampled, sub-networks created by our sampling method can predict node types with higher accuracy than other baseline methods.

並列關鍵字

Social Network Analysis ； Heterogeneous Network ； Graph Sampling ； Node Type Prediction

參考文獻

[2] Leskovec et al. Sampling from large graphs. Proceedings of the 12th ACM SIGKDD (2006)

[3] Ma et al. Ego-centric Network Sampling in Viral Marketing Applications. Mining and Analyzing Social Networks (2010)

[4] Jhao-Yin Li and Mi-Yen Yeh. On Sampling Type Distribution from Heterogeneous Social Networks. PAKDD (2011)

[5] Maiya et al. Sampling community structure. WWW (2010)

[6] S. Ye, J. Lang, and F. Wu, Crawling Online Social Graphs. APWeb (2010)

國際替代計量

異質性社群網路取樣

全文下載

主題瀏覽