透過您的圖書館登入
IP:13.59.9.236
  • 學位論文

叢集傳播紀錄 : 用傳播資料推論多個訊息網路

Cluster Cascades : Infer Multiple Information Networks Using Diffusion Data

指導教授 : 陳銘憲

摘要


訊息散播與病毒傳播常常是時常發生的網路上的基本過程, 最近 如何設計一個策略促進或阻止這個過程獲得了很大的注意; 然而,其 中最大的問題是,傳播的通道往往是隱蔽的。換句話說, 我們可以觀 察到網路中的點何時被訊息「感染」,但卻無法知道這些點是如何被 傳播的。 大部分處理這類問題的方法是假設有一個潛在的網路。訊息 可以在這個網路上傳播。 然而,在實際的情況下,訊息通道的存在與 很多因素相關,如 : 傳播的訊息的主題,傳播的時間等。 舉例來說,政 治新聞傳播的方式會跟運動新聞或其他類型的新聞不同。政治新聞的 本身也會因為時間的不同而有不同的傳播方式。選舉時,訊息傳播的 速度會較平常快速。在這種情況下,只用一個網路來模擬整個過程是 相當困難的。 在這篇論文中,我們提出了一個演算法 MixCascades 。 這個演算法 讓我們可以叢集相似的傳播記錄並對每一個叢集推論一個相對應的網 路。此外,我們提出一個方法可以自動選取適當的叢集數量。藉由合 成跟真實資料,我們發現我們的演算法可以非常有效率的叢集歷史訊 息並且還原真正的網路。

關鍵字

叢集 傳播 網路

並列摘要


Information diffusion and virus propagation are fundamental processes often taking place in networks. The problem of devising a strategy to fa- cilitate or block such process has received considerable attention. However, a major challenge is that transmission pathways are often hidden. In other words, one can only observe cascades, time stamps when nodes are infected with events, but couldn’t know where and from whom nodes are infected. Most researches dealing with the problem assume an underlying network over which cascades spread. In real world, whether the transmission path- ways of a contagion, a piece of information, emerges or not depends on many factors such as the topic of the information and the time when the information first are first mentioned. Political news, for example, spreads in a different way from sports news. Political news itself spreads differently as time varies. It spreads much faster when there is an election than usual. Therefore, it is hard to model the diffusion processes by using only one single network when information are of all kind. In this thesis, we proposed an probabilistic generative mixture model that models the generation of cascades, the time-stamps when the nodes mention information. Our algorithm, MixCascades, could cluster similar cascades and infer a corresponding underlying network for each cluster in the expectation- maximization framework. Besides, our algorithm could determine the num- ber of clusters automatically. In both synthetic and real cascade data, we show that our algorithm could cluster cascades and recover the underlying networks very effectively.

並列關鍵字

clustering diffusion network

參考文獻


[1] N. T. J. Bailey. The Mathematical Theory of Infectious Diseases and its Applica-
tions,. Hafner Press, 1975.
[3] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data
influence. In NIPS ’12: Advances in Neural Information Processing Systems, 2012.
structure by using mobile phone data. Proceedings of the National Academy of

延伸閱讀