透過您的圖書館登入
IP:18.218.89.173
  • 學位論文

基於異質圖神經網路與使用者文章關鍵字交互學習應用於假新聞偵測

Fake news detection based on heterogeneous graph neural network via user-post-keyword interaction learning

指導教授 : 蔡政安
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,隨著網路快速的發展,社群媒體成為人們生活中不可分割的一部分,藉由網路可以更容易獲取資訊或表達自己的想法,但也衍生出資訊過度氾濫的問題,因此,文章的真實性成為大家很重視的議題。在這項研究中,我們提出一種基於異質圖 (Heterogeneous graph)的模型,結合了基於內容 (content-based) 和基於網絡 (network-based) 方法的優點,並且利用文章、關鍵字、轉發者的交互作用來檢測社交媒體上的假文章。 在該模型中,不僅利用兩種分詞方法 (General Tokenizer 和 BertTokenizer) 來存取原始文本的資訊,也記錄了轉發者的傳播路徑,以從不同方面捕捉假新聞的模式。此外,我們會用幾個關鍵詞來表示每篇文章,因為一些假文章會有特定的詞,並且使用圖注意力神經網絡,來學習文章、轉發者和關鍵字的交互作用。最後,結合雙向長短期記憶 (Bidirectional Long Short-Term memory) 和圖注意力神經網絡 (Graph attention network),以更新後的向量來預測文章的真實性。論文中也透過廣泛的實驗對不同的方法和我們的模型進行了全面比較。結果顯示,我們提出的模型在不同的資料集 (Twitter 15 16、FakeNewsNet) 中普遍具有優越的表現,對於假文章偵測的準確率高達95%,明顯優於其他方法4%。

並列摘要


In recent years, with the rapid development of the Internet communication, social media has become an inseparable part of people's lives. It is easier for people to obtain information or express their thoughts through the Internet. Therefore, the authenticity of articles rises as a primary concern for public affairs. In this research, we propose a novel method to detect fake news on social media. Given the article and the corresponding retweeters without text comments, we aim to predict whether the article is fake or not. We propose a graph-based model that combines the advantage of both content-based and network-based learning models. In this model, we not only consider the information of the raw text via two tokenizer methods, namely General Tokenizer and BertTokenizer, but also record the propagation of the retweeter to capture the pattern of fake news from different aspects. In addition, we would represent each article by several keywords because some fake news would have specific words. We construct a heterogeneous graph with a graph attention network to capture the interactions of news, retweeters, and keywords. Finally, we apply two other methods which are Bidirectional Long Short-Term memory (BiLSTM) and Graph attention network (GAT) to learn the representation of articles to determine whether an incoming article is fake or not. We perform a comprehensive comparison of different content-based and network-based methods via extensive experiments. Results show that our proposed model generally has superior performance in different datasets (Twitter 15 16, FakeNewsNet), and the accuracy rate is up to 95% and significantly outperforms baseline methods by 4%.

參考文獻


[1] Oren Melamud, Jacob Goldberger, and Ido Dagan. context2vec: Learning generic context embedding with bidirectional lstm. In Proceedings of the 20th SIGNLL
conference on computational natural language learning, pages 51–61, 2016.
[2] Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. Text classification improved by integrating bidirectional lstm with two-dimensional max pooling. arXiv preprint arXiv:1611.06639, 2016.
[3] Che-Wen Chen, Shih-Pang Tseng, Ta-Wen Kuan, and Jhing-Fa Wang. Outpatient text classification using attention-based bidirectional lstm for robot-assisted servicing in hospital. Information, 11(2):106, 2020.
[4] Weijiang Li, Fang Qi, Ming Tang, and Zhengtao Yu. Bidirectional lstm with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing, 387:63–77, 2020.

延伸閱讀