透過您的圖書館登入
IP:18.216.34.146
  • 學位論文

基於主題目模型的用戶分群應用

Application of User Clustering Based on Topic Modeling

指導教授 : 陳景祥

摘要


隨著網路科技的進步,社群網路媒體已廣為大眾使用。人們在社群網路,如:facebook、twitter等,發表自己的言論。這些言論可以反映出用戶們的許多資訊,例如:喜歡的事物、理念傾向等。我們亦可運用這些資訊將用戶們分群後,以利後續的研究分析或獲取商業利益。在本篇中,我們藉由蒐集用戶在社群網路中所發的文章並運用主題模型來進行用戶們的分析,找出各用戶常用的主題字彙後,再使用集群分析,如:k-means、affinity propagation等方法將相似的用戶們進行分群。我們也探討加入時間後,在各個時間區間下,觀察用戶們主題以及分群的變化。最後,本篇也使用了PTT的資料,呈現出中文的文章在運用此方法下,用戶分群的效果以及發現。

並列摘要


With the advancement of network technology, social media has been widely used by the public. People express their opinions on social networks such as facebook or twitter. These remarks can reflect a lot of information about users, such as favorite things, ideas or tendencies. We can use these information to group users for facilitating subsequent research analysis or gaining business benefits. In this article, we collect the documents sent by users in the social network and using the topic model to find out which topics commonly used by each user. After finding the topic distribution for each user, we can cluster them by using some clustering analysis methods such as k- means, affinity propagation, etc. We also consider the time effect and explore the changes in the user's topic and clustering in each time slice. Finally, We also uses the PTT data, showing the effect of the user clustering and some discovery under the Chinese documents.

參考文獻


[1] Arun, R., Suresh, V., Madhavan, C.E.V., Murty, M.N. (2010), On finding the natural number of topics with Latent Dirichlet Allocation: Some observations, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 391-402
[2] Blei D.M., Lafferty J.D. (2006), Dynamic topic models, ACM International Conference Proceeding Series, 148, 113-120
[3] Blei D.M., Ng A.Y., Jordan M.I. (2003), Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993-1022
[4] Cao J., Xia T., Li J., Zhang Y., Tang S. (2009), A density-based method for adaptive LDA model selection, Neurocomputing, 72, 1775-1781
[5] Cha M., Haddadi H., Benevenuto F., Gummadi K.P. (2010), Measuring user influence in twitter: The million follower fallacy, ICWSM 2010 - Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, 10-17

延伸閱讀