透過您的圖書館登入
IP:18.118.9.7
  • 學位論文

運用數位人文工具進行網路論壇之檔案主題及情感探勘

The Application of Digital Humanities Tools for Mining Archive Subject and Emotion in Social Networks

指導教授 : 林巧敏

摘要


伴隨檔案服務社會化進程的推動,檔案和檔案工作逐漸走入大眾視野,成為社會各界津津樂道的新話題。與此同時,在信息技術的驅動下,網路論壇一步步發展壯大,成為社會大眾表達態度、傳播思想的重要媒介,其中更不乏對檔案議題的討論與交流。藉助數位工具對網路論壇中檔案主題貼文的內容進行探勘,可以了解時下社會最關注的核心主題及對它們的情感和認知,為如何更好地開展檔案工作提供諸多參考。 本研究以風聞社區為來源,收集整理162篇2019年1月1日至2020年12月31日期間針對檔案主題發表的貼文。首先通過NLPIR大數據語義分析平台展開分詞、詞性標註及詞頻統計的文本預處理,接著藉助中文情感詞彙本體庫和Weka進行基於情感詞典和機器學習的文本情感分析。透過以上流程,分析檔案主題貼文呈現出的核心議題以及正負向情感。 研究結果顯示,民眾最為關注的議題整體分為兩種,一是以「毛澤東」、「蔣介石」、「斯大林」及「中蘇」、「國共」和「抗美援朝」為核心的特定人物與事件議題;二是以「檔案工作實務」和「檔案公開」為代表的社會議題。在情感方面,檔案主題貼文整體呈現明顯的負向情感,尤其是在針對特定人物與事件的討論中,負面情緒更為明顯。而在談及檔案工作實務時,則展現出了相對積極的態度和認知。基於以上研究成果,研究認為檔案工作機構一方面要做好本職工作,提升業務水平,另一方面也要多渠道、多角度地開展各種形式的檔案資訊公開和檔案宣傳工作,滿足大眾對檔案的需求和期待,塑造良好的社會形象,提升檔案及檔案工作的社會地位和影響力,促進檔案事業蓬勃發展。

並列摘要


With the process of the socialization of archives service, archives and archives work have gradually become a new topic in society. At the same time, driven by information technology, the Internet forum has developed step by step and become an important medium for the public to express their attitudes and spread their ideas, where has many discussions on archive issues. With the help of digital tools, we can have a data mining on these posts, so as to explore the core topics that the society is most concerned about and their emotions towards them. It could provide many references for archive departments to work better. In this study, we collected 162 articles published on archival topics during January 1, 2019 to December 31, 2020 from the Internet forum. Firstly, use the NLPIR to finish the word segmentation, part-of-speech tagging and word frequency statistics. The text sentiment analysis based on sentiment dictionary and machine learning is continued with the help of the Chinese Emotion Word Ontology and WEKA. Through the above process, the paper analyzed the core issues and emotional implications of the archival theme posts. The results show that the topics that people are most concerned about are generally divided into two types. One is the specific figures and events such as "Mao Zedong", "Chiang Kai-shek", "Stalin", "the Sino-Soviet relations", "the KMT-CPC relations" and " the War to Resist U.S. Aggression and Aid Korea ". The other is the social issues represented by " archival work practice" and "archives opening". Meanwhile, the whole posts about archives present obvious negative emotion, especially in the discussion of specific characters and events. Only when it comes to archival work practice, it shows a relatively positive attitude and cognition. Based on research results, archive departments, on the one hand, should do their job well and enhance the work level. On the other hand, they should also pay attention on how to propagate archives and archives work through various mediums and forms, to meet the public demand and expectation of archives, to shape good social image, to improve the social status and influence of archives and archives work, and to make archival undertaking develop vigorously.

參考文獻


丁蔚(2017)。基於詞典和機器學習組合的情感分析〔碩士學位論文,西安郵電大學〕。中國知網博碩士論文檢索系統。https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201702&filename=1017079251.nh
中央研究院中文詞知識庫小組(2020)。CKIP Lab。檢索自https://ckip.iis.sinica.edu.tw/demo
中國國家檔案局(2020)。中華人民共和國檔案法。檢索自https://www.saac.gov.cn/daj/falv/202006/79ca4f151fde470c996bec0d50601505.shtml
孔杏、林慶(2018)。主觀性文本情感分類研究綜述。信息技術,42(08),126-130+134。https://doi:10.13274/j.cnki.hdzj.2018.08.028
毛國君(2003)。數據挖掘技術與關聯規則發掘算法研究〔博士論文,北京工業大學〕。中國知網博碩士論文檢索系統。https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CDFD9908&filename=2003084056.nh

延伸閱讀