透過您的圖書館登入
IP:3.134.78.106
  • 學位論文

自然語言處理技術應用於中文網路新聞議題立場分析

Natural language processing technology applied to stance analysis of Chinese online news issues

指導教授 : 林斯寅 吳肇銘

摘要


具爭議性的議題一直以來都是人們關注的焦點,而媒體扮演著重要的角色,將這些議題的資訊傳播給閱聽人。近年來隨著資訊科技的發達,越來越多人使用網路接收新聞資訊,而不是電視媒體、報章雜誌或廣播節目。媒體的報導可能存在著偏見,使新聞內容在爭議議題上有立場的分別。網路新聞也因為個人化推薦系統的興起,導致使用者對於爭議議題可能只接收單方面的資訊,進而對爭議議題產生誤解,甚至造成社會對立。 因此,本研究使用了自然語言處理技術,在中文網路新聞進行議題立場分析的研究,從資料蒐集、資料清理、資料標記,一條龍式的從無到有,建立中文網路新聞議題立場資料集。資料標記階段,本研究透過標記新聞標題的方式,找出立場相關的句子(標示句),在SVM-Linear的方法上,模型整體表現達八成以上,以此模型為基礎,建構新聞內容輔助標記系統。新聞議題立場分類模型,其輸入的部分,透過調整訓練資料集,以新聞標題、新聞內容跟標示句不同組合的方式,提升新聞立場分類的正確率,更在以新聞內容為輸入資料,有中立新聞的情況下,BERT的整體模型表現優於SVM-Linear;新聞內容為輸入資料,沒有中立新聞的情況下,SVM-Linear的整體模型表現優於BERT且其各項指標均有接近八成或以上的表現,具備一定程度的識別能力。最後,透過本研究所建立的資料集,進行立場分析,整合立場資訊,如新聞標題立場、新聞內容立場、媒體報導立場與時間軸分析,建構了議題立場資訊揭露的閱讀系統,輔助閱聽人在爭議議題新聞閱讀上有著更多對於該新聞的認知。

並列摘要


Controversial issues have always been the focus of attention, and the media play an important role in disseminating information about these issues to the audience. With the development of information technology in recent years, more and more people use the Internet to receive news information instead of television media, newspapers, magazines or radio programs. Media reports may be biased, making news content different on controversial issues. Because of the rise of personalized recommendation systems, online news has led users to receive unilateral information on controversial issues, which leads to misunderstanding of the controversial issues and even social opposition. Therefore, this research uses natural language processing technology to conduct research on issues stance analysis in Chinese online news, from data collection, data cleaning, and data labeling, from scratch to build Chinese online news issues stance dataset. In the data labeling stage, this research uses the method of labeling news headlines to find sentences related to the position. In the SVM-Linear method, the overall performance of the model reaches more than 80%. Based on this model, the auxiliary mark of news content is constructed. system. The model uses BERT as the main model. The input part is adjusted to the training data set to improve the correct rate of news position classification by adjusting the training data set to improve the accuracy of news position classification. News content is used as the input data. And there is no neutral news, the overall performance of the model is more than 70%, with a certain degree of recognition ability. Finally, through the data set established by the institute, position analysis was carried out, and stance information was integrated, such as news headline stance, news content position, media reporting stance and timeline analysis, and constructed a reading system for issues stance information disclosure.

參考文獻


1. Buder, J., Schwind, C. (2012). Learning with personalized recommender systems: A psychological view. Computers in Human Behavior, 28(1), 207-216.
2. Chen, W. M. (2016). 使用輔助向量的雙邊特徵分群以改善中文新聞的立場偵測分類. 臺灣大學資訊工程學研究所學位論文, 1-32.
3. Covert, T. J. A., Wasburn, P. C. (2009). Media bias?: a comparative study of Time, Newsweek, the National review, and The progressive coverage of domestic social issues, 1975-2000. Lexington Books.
4. Cui, Y., Che, W., Liu, T., Qin, B., Yang, Z., Wang, S., Hu, G. (2019). Pre-Training with Whole Word Masking for Chinese BERT. arXiv preprint arXiv:1906.08101.
5. Devlin, J., Chang, M. W., Lee, K., Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

延伸閱讀