以社群媒體語言建構深度學習模型：以「校正回歸」為例

本研究以台灣新冠肺炎期間首度出現的「校正回歸」一詞相關的臉書貼文為語料，進行人工情感分析與模型預測。我們對6,917筆語料進行人工標記，並將這些標記完成的語料分成70％和30％，以BERT-Chinese之預訓練模型（pre-trained model）利用70%的語料進行微調機制（fine tune），再以微調後的模型預測剩餘的30%語料，並加以比對人工標記和模型預測的結果，試圖從語言特徵找出兩者間差異的可能原因。研究結果顯示，在人工標註為中立的貼文中，模型有較好的預測能力，正確率達0.81；而人工標註為正向和負向的貼文中，模型的預測能力較差，分別為0.64和0.63。進一步觀察人工標註和模型預測的差異，人工標註為負向而模型預測為正向的有0.23，乃所有錯誤之最，其次為人工標註為正向而模型預測為中立的貼文，0.22。我們逐筆檢視這兩大類貼文，歸納出7類負向情感的語言特徵及4類正向情感的語言特徵。在檢視語言特徵時，研究者亦發現，由於本文所搜集之語料具有高度的公共性與政治性，僅討論貼文內容有時不易判斷意義，還需考慮貼文者身份，此亦可能影響了機器預測的正確率。我們主張，社群媒體的語言有別於當下模型訓練使用的資料集，且貼文者常常使用表情符號或標點符號來表達情感，未來我們將發展適合台灣的社群媒體語意的預測模型，以期提升模型預測的正確率。

關鍵字

社群媒體；深度學習；校正回歸；情感分析；自然語言處理

並列摘要

This research, which used Facebook posts related to the term ＂retrospective adjustment＂ in Taiwan as the corpus, manually coded the sentiments of 6,917 posts. Randomly dividing the dataset into two subsets for training (70%) and testing (30%) and using the Chinese pre-trained BERT model as the foundation, we trained and fine-tuned the model with the training dataset and ran the fine-tuned model to predict the sentiments in the test dataset. We then compared the results of the manual coding and model prediction to explain the differences from the perspective of linguistic features. The results indicated that the model performed better for the posts manually coded as ＂neutral,＂ with an accuracy of 0.81, while the accuracies of model prediction were only 0.64 and 0.63 for the posts manually coded as ＂positive＂ and ＂negative,＂ respectively. Regarding inaccuracy, the posts manually coded as ＂negative＂ but predicted by the model as ＂positive＂ and those manually coded as ＂positive＂ but predicted by the model as ＂neutral＂ ranked the highest (0.23) and the second highest (0.22), respectively. Examining the linguistic features of the two groups of posts, we identified seven categories of linguistic features that, we claim, led to ＂negative＂ coding and four categories that led to ＂positive＂ coding. Moreover, both groups contained posts that could not be coded accurately without knowledge of the news and the Facebook account owners' political/social inclinations, which was attributed to the posts' high relatedness to the general public and the politics of Taiwan. Considering that the language used in social media is different from the language employed to train current models, and that Facebook users frequently use punctuation marks and emoticons to express their moods, we argue that there is a need to develop a model for social media.

並列關鍵字

Social Media ； Deep Learning ； Retrospective Adjustment ； Sentiment Analysis ； Natural Language Processing

參考文獻

陳韋帆、古倫維（2018）。中文情感語意分析套件 CSentiPackage 簡介。圖書館學與資訊科學，44(1)， 24-41。https://doi.org/10.6245/JLIS.201804_44(1).0002 [Chen, W.-F., & Ku, L.-W. (2018). Introduction to CSentiPackage: Tools for Chinese sentiment analysis. Journal of Library and Information Science, 44(1), 24-41. https://doi.org/10.6245/JLIS.201804_44(1).0002 ]

Al-Tahmazi, T.H. (2015). The pursuit of power in Iraqi political discourse: Unpacking the construction of sociopolitical communities on Facebook. Journal of Multicultural Discourse, 10(2), 163-179. https://doi.org/10.1080/17447143.2015.1042383

Chibuwe, A., & Ureke, O. (2016). ‘Political gladiators’ on Facebook in Zimbabwe: A discursive analysis of intra-Zimbabwe African National Union - PF cyber wars; Baba Jukwa versus Amai Jukwa. Media, Culture & Society, 38(8), 1247-1260. https://doi.org/10.1177/0163443716671492

Jain, P. K., Quamer, W., Saravanan, V., & Pamula, R. (2022). Employing BERT-DCNN with sentic knowledge base for social media sentiment analysis. Journal of Ambient Intelligence and Humanized Computing, 2022, 1-13. https://doi.org/10.1007/s12652-022-03698-z

Lu, Y., Pan, J., & Xu, Y. (2021). Public sentiment on Chinese social media during the emergence of COVID-19. Journal of Quantitative Description: Digital Media, 1, 1-47. https://doi.org/10.51685/jqd.2021.013

國際替代計量

以社群媒體語言建構深度學習模型：以「校正回歸」為例

全文下載

主題瀏覽