透過您的圖書館登入
IP:13.58.137.218
  • 學位論文

一個以Word2Vec為基礎的視覺化導向社群網路關鍵字分析

A Visualization-Driven Keywords Analysis in Social Networks Based on Word2Vec

指導教授 : 陳履恒

摘要


隨著智慧型手機的普遍,社群網路軟體更是大量興起,PTT、INSTAGRAM、FACEBOOK、TWITTER…等,使用這些社群軟體的人只增不減,根據調查顯示,PTT總註冊人數約150萬人,尖峰時段甚至有超過15萬名使用者為線上狀態,擁有超過2萬個不同主題看板,每天超過2萬篇新文章以及50萬則推文,是使用人次最多的華語網路服務之一,也是網路言論自由文化的興起地之一,由於使用者的帳號並非那麼公開且透明,留言內容相較於其他社群軟體較不避諱,故本實驗針對PTT進行實驗。首先,我們會讓我們系統的使用者輸入一個關鍵字,並搜尋此關鍵字在PTT相關文章,透過分析網頁元素的技巧,我們會把所有留言爬取下來並存檔,接著以JIEBA斷詞工具進行句子的分割,然後使用STOPWORD LIST過濾無意義的冗言贅字,然後再使用WORD2VEC訓練以上步驟所產生的語料庫。 最後我們會以視覺化的方式呈現出分析結果,以往的分析結果常會以二維方式例如:長條圖,折線圖..等等,總是會讓使用者看得眼花撩亂,能呈現的範圍也有所限制,因此我們使用force-driven方法,並將資料結果以3D視覺呈現,提升使用者操作效率,讓操作此系統的使用者,可以透過互動式的方法,簡單並快速的找到相關資料。

關鍵字

社群網路 JIEBA WORD2VEC force-driven

並列摘要


With the popularity of smartphones, amounts of community network software have emerged, such as PTT, INSTAGRAM, FACEBOOK, TWITTER... etc. The number of people using this social software is increasing. Although there is countless social software, we specially studied PTT because of the following characteristics. According to the survey, the total number of registered PTT is about 1.5 million. There are even more than 150,000 users online during peak hours, with more than 20,000 different-themed billboards, more than 20,000 new articles, and 500,000 tweets every day. It is one of the most frequently used Chinese-language Internet services. Also, PTT is one of the places where the culture of free speech on the Internet emerged. The users' accounts are not that public and transparent, and the content is less evasive than other community software. According to the survey above, we can know how PTT affect our life. As to analysis the keywords in PTT, we started our experiment. First, we will let system users enter a keyword to search for the related articles in PPT. Through the analysis of web page elements, we may crawl all the news and archive. As we got the analysis of the keywords, we use the JIEBA word segmentation tool to segment the sentence, STOPWORD LIST to filter meaningless words, and WORD2VEC to train the corpus generated by the above steps. Finally, we will visually present the analysis results. Relevant papers in the previous period are presented in a two-dimensional way. Such as bar graphs, line graphs, etc., which will always make the user see dazzlingly. The scope of the presentation is also limited. Thus, we use the force-driven method and present the data results in 3D visually to improve user operation efficiency. Ultimately, users who operate this system can find relevant information through an interactive method in a more easily and quickly way.

並列關鍵字

Social network JIEBA WORD2VEC force-driven

參考文獻


[1] G. L'Huillier, A. Hevia, R. Weber and S. Ríos, "Latent semantic analysis and keyword extraction for phishing classification," 2010 IEEE International Conference on Intelligence and Security Informatics, 2010, pp. 129-131, doi: 10.1109/ISI.2010.5484762.
[2] E. A. Dahouei, "A cloud-based dashboard for time series analysis on hot topics from social media," 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), 2017, pp. 1-6, doi: 10.1109/ICECDS.2017.8389495.
[3] 黃映瑋(2018)。建構以資訊視覺化為基礎之專利關鍵字分析平台。東海大學圖書資訊學研究所碩士論文,台中市。 取自https://hdl.handle.net/11296/53ajpb
[4] C. Zhang, X. Wang, S. Yu and Y. Wang, "Research on Keyword Extraction of Word2vec Model in Chinese Corpus," 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), 2018, pp. 339-343, doi: 10.1109/ICIS.2018.8466534.
[5] K. A. Djaballah, K. Boukhalfa and O. Boussaid, "Sentiment Analysis of Twitter Messages using Word2vec by Weighted Average," 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2019, pp. 223-228, doi: 10.1109/SNAMS.2019.8931827.

延伸閱讀