透過您的圖書館登入
IP:13.58.252.8
  • 學位論文

社群網路中資訊傳播預測之探討:以Twitter為例

Investigation of Predicting Information Propagation on Social Network:Using Data from Twitter

指導教授 : 曹承礎
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


社群網站隨著行動裝置以及無線網路發展,已成為人們生活中不可或缺的一部分,著名的社群網站像是FaceBook, Twitter, LinkedIn, Plurk…等,每天都有成千上萬名使用者透過社群網站來維繫人際關係、表達自我、分享軼聞趣事,已有許多研究利用這個過程中所產生的龐大資訊量,進行了行銷、社交行為、金融預測、疾病與災害防治等分析與應用。Twitter在社群網站中屬於微網誌的分類,是世界第二大的社群網站,由於每則發文不得超過140個字元,使其就像網際網路中的簡訊服務一般,無形中增加了使用者撰寫發文的次數,提升了資訊傳播的效率。轉推是Twitter中資訊傳播的重要機制,他能讓有用、有趣的訊息爆炸性地散播,基於Twitter多元的使用者以及訊息傳播的便利性,了解具有甚麼樣特性的發文較易被轉推便成為重要的議題之一。 本研究利用twitter4j從Twitter上收集資料,並依內容特性、發文特性以及作者特性利用資料探勘方法建立預測模型來預測發文所獲得之轉推數等級。在前測階段我們選用支援向量機(Support Vector Machine)、單純貝氏分類器(Naive Bayes)以及決策樹(Decision Tree)三種資料探勘方法分別建立預測模型比較預測能力,選擇效能最好之資料探勘方法,再分別對內容特性、發文特性、作者特性以及本研究所提出之特殊變數建立預測模型,進行預測能力比較。 我們的實驗使用weka此一資料探勘工具來進行預測模型建立,輔以十折交叉驗證(10-fold cross validation)進行模型訓練。研究結果顯示決策樹是前測階段效能最好之資料探勘方法,並且使用本研究所提出的十個特殊變數建立之預測模型,比內容特性、發文特性、作者特性任一預測模型之整體預測力還要好,但是在各推文數等級每個預測模型之預測力各有千秋,因此不宜偏頗地只使用某項特性。

關鍵字

Twitter 資訊傳播 影響力分析 預測

並列摘要


As the development of mobile devices and wireless network, social network has become an indispensable part of humans’ lives. Famous social network sites such as FaceBook, Twitter, LinkedIn and Plurk has numerous users maintaining interpersonal relationships, presenting themselves, and sharing anecdotes with social network sites everyday. Many studies had utilized the huge amount of information generated in this process to analyze and apply in marketing, social behavior, financial forecasting, disease and disaster prevention. Twitter is categorized to micro-blog in social network, it is second-largest social network site in the world. Due to the restriction of 140 characters in a post, make Twitter as the SMS in social network. Potentially increased the number of posts written by users, and enhanced the efficiency of the information propagation. Retweet is the key mechanism for information propagation in Twitter. It emerged as a simple yet powerful way of disseminating useful information. Based on the abundance of users and convenience of information propagation, understanding what kind of posts will be retweet more easily has become an important issue. In this study, we collected datasets from Twitter by twitter4j and build a predictive model to predict the level of retweet number by data mining technology based on content feature, post feature, author feature and special variables we proposed in this study. In the pretest stage, we chose Support Vector Machine, Naive Bayes and Decision Tree to build predictive model and compared the performance of each model. Then we selected the method with best performance, and used this method to build predictive model based on features we mentioned earlier respectively. Our experiments are executed with weka, a data mining tool, to build predictive model and performed by a 10-fold cross-validation to train the predictive model. Experiment results shows that Decision Tree is the best data mining method in pretest stage, and after comparing the performances of each predictive models, we found that the model built based on special variables we proposed in this study was the best among all features. And every predictive model has different predict power in different retweet number levels, so it is biased to use only one feature to build predictive model.

參考文獻


Elham Khabiri, Chiao-Fang Hsu, and James Caverlee. (2009). Analyzing and Predicting Community Preference of Socially Generated Metadata: A Case Study on Comments in the Digg Community. ICWSM.
Asur Sitaram, & Huberman Bernardo A. (2010). Predicting the Future With Social Media.
Berlo David Kenneth. (1960). The process of communication: an introduction to theory and practice.
Carl Iver Hovland, Irving Lester Janis, & Kelley Harold H. (1985). Communication and persuasion: psychological studies of opinion change.
Cha Meeyoung, Haddadi Hamed, Benevenuto Fabricio, & Gummadi Krishna P. (2010). Measuring User Influence in Twitter: The Million Follower Fallacy.

延伸閱讀