利用異常偵測技術於可疑帳號辨識之研究

近幾年來，「假新聞」、「假訊息」等威脅，在資訊戰中已達到國安等級，也成為了許多國家研究的重點。但此議題並非為新興現象，例如，早在2014年俄羅斯介入影響烏克蘭的克里米亞歸屬公投，以及最近的烏俄戰爭中，我們都可以看到不管是俄羅斯或是其餘國家，許多社群媒體帶風向的情況。因此，本論文專注於發布可疑訊息的帳號以及貼文，並利用Twitter官方的計畫網站－「Transparency」網站中，Twitter定義可疑帳號為跟政府或州有關的假訊息操弄帳號，公布經調查確認為可疑帳號以及貼文的資料。有別於以往的識別方式，我們利用機器學習中的「異常偵測」技術，訓練出一個能以高準度分辨出異常訊息以及異常帳號之分辨器。在資料收集方面，我們建立基於ETL框架的資料爬取系統，爬取了名人的官方帳號以及推文。並利用官方已經證實身分之有「藍勾勾」的帳號所發布之正常貼文，來驗證分辨器誤判之情形。從實驗結果，我們發現準確度達到96％，獲得很好的效果。

關鍵字

可疑帳號；假訊息；自然語言處理；機器學習；異常偵測； ETL ；爬蟲

並列摘要

In recent years, threats such as ＂fake news＂ and ＂disinformation＂ have reached the level of national security in information warfare, and have become an important research issue. For example, as early as 2014, Russia intervened to influence Ukraine's Crimea referendum, and in the recent Ukrainian-Russian War, we can see that in many communities, whether Russia or the others, the media takes the wind. This article focuses on the accounts and posts that publish suspicious information, and uses Twitter's official project website-Transparency website. Twitter defines suspicious accounts as accounts that manipulate disinformation related to the government or state, and publishes them after investigation and confirmation. Different from the previous identification methods, in this paper we use the ＂anomaly detection＂ technology in machine learning to train a classifier that can distinguish abnormal messages and abnormal accounts with high accuracy. For the dataset, we established a data crawling system based on the ETL framework, and crawled official accounts and tweets of celebrities. And use the normal posts posted by the accounts with blue tick, whose identities have been officially confirmed, to verify the performance of the classifier. From the experimental results, we found that the accuracy of our identification method reached 96%.

並列關鍵字

suspicious account ； misinformation ； natural language processing ； machine learning ； anomaly detection ； ETL ； crawler

參考文獻

V-Dem for digital society project 2018: http://digitalsocietyproject.org/foreign-intervention-on-social-media/

Google Scholar

J. Im, E. Chandrasekharan, J. Sargent, P. Lighthammer, T. Demby, A. Bhargava, L. Hemphill, D. Jurgens and E. Gilbert, "Still Out There: Modeling and Identifying Russian Troll Accounts on Twitter," 12th ACM Conference on Web Science, 2020.

Google Scholar

Anomaly Detection 2020:https://medium.com/學以廣才/異常檢測-anomaly-detection-fa300fe6df71

Google Scholar

T. N. Kipf and M . Welling, "Variational Graph Auto-Encoders," Bayesian Deep Learning Workshop, NIPS 2016.

Google Scholar

B. Du, C. Liu, W. Zhou, Z. Hou, and H. Xiong, "Catch Me If You Can: Detecting Pickpocket Suspects from Large-scale Transit Records," 22nd ACM SIGKDD International Conference, 2016.

Google Scholar

主題瀏覽