叢集傳播紀錄 : 用傳播資料推論多個訊息網路

訊息散播與病毒傳播常常是時常發生的網路上的基本過程，最近如何設計一個策略促進或阻止這個過程獲得了很大的注意；然而，其中最大的問題是，傳播的通道往往是隱蔽的。換句話說，我們可以觀察到網路中的點何時被訊息「感染」，但卻無法知道這些點是如何被傳播的。大部分處理這類問題的方法是假設有一個潛在的網路。訊息可以在這個網路上傳播。然而，在實際的情況下，訊息通道的存在與很多因素相關，如 : 傳播的訊息的主題，傳播的時間等。舉例來說，政治新聞傳播的方式會跟運動新聞或其他類型的新聞不同。政治新聞的本身也會因為時間的不同而有不同的傳播方式。選舉時，訊息傳播的速度會較平常快速。在這種情況下，只用一個網路來模擬整個過程是相當困難的。在這篇論文中，我們提出了一個演算法 MixCascades 。這個演算法讓我們可以叢集相似的傳播記錄並對每一個叢集推論一個相對應的網路。此外，我們提出一個方法可以自動選取適當的叢集數量。藉由合成跟真實資料，我們發現我們的演算法可以非常有效率的叢集歷史訊息並且還原真正的網路。

關鍵字

叢集；傳播；網路

並列摘要

Information diffusion and virus propagation are fundamental processes often taking place in networks. The problem of devising a strategy to fa- cilitate or block such process has received considerable attention. However, a major challenge is that transmission pathways are often hidden. In other words, one can only observe cascades, time stamps when nodes are infected with events, but couldn’t know where and from whom nodes are infected. Most researches dealing with the problem assume an underlying network over which cascades spread. In real world, whether the transmission path- ways of a contagion, a piece of information, emerges or not depends on many factors such as the topic of the information and the time when the information ﬁrst are ﬁrst mentioned. Political news, for example, spreads in a different way from sports news. Political news itself spreads differently as time varies. It spreads much faster when there is an election than usual. Therefore, it is hard to model the diffusion processes by using only one single network when information are of all kind. In this thesis, we proposed an probabilistic generative mixture model that models the generation of cascades, the time-stamps when the nodes mention information. Our algorithm, MixCascades, could cluster similar cascades and infer a corresponding underlying network for each cluster in the expectation- maximization framework. Besides, our algorithm could determine the num- ber of clusters automatically. In both synthetic and real cascade data, we show that our algorithm could cluster cascades and recover the underlying networks very effectively.

並列關鍵字

clustering ； diffusion ； network

參考文獻

[1] N. T. J. Bailey. The Mathematical Theory of Infectious Diseases and its Applica-

tions,. Hafner Press, 1975.

[3] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data

inﬂuence. In NIPS ’12: Advances in Neural Information Processing Systems, 2012.

structure by using mobile phone data. Proceedings of the National Academy of

國際替代計量

叢集傳播紀錄 : 用傳播資料推論多個訊息網路

全文下載

主題瀏覽