透過您的圖書館登入
IP:18.118.144.69
  • 學位論文

應用構詞語法於中文評論之情感分析

Sentiment Analysis of Chinese Reviews Using Morphosyntactic Patterns

指導教授 : 陳正賢

摘要


情感分析是自然語言處理領域最常討論的主題之一。情感分類經常使用詞袋模型(bag-of-words model)搭配n元語法(n-gram)建立分類模型,過去研究亦顯示,採用語法特徵和篇章特徵等非詞袋特徵,也能為分類效能帶來重要貢獻。本研究旨在分析透過語言學知識集成之構詞語法在中文電影評論中對於情感詞極度之影響,並探討其應用是否能夠有效提升文本情感分類效能。本研究先利用模式文法(pattern grammar),以質化角度歸納出情感相關句構組合,再利用雙樣本中位數差異檢定(Wilcoxon rank-sum test),以量化角度檢測句構對於情感詞極度之影響,從而識別句構對於情感詞調節之偏好。 研究結果發現,句構組合具有兩種情感調節偏好:增強正向情感詞之情感極度,以及削弱負向情感詞之情感極度。後續的詞彙連接分析(collexeme analysis)也顯示,增強情感極度之句構普遍吸引正向情感詞,而削弱情感極度之句構則吸引負向情感詞較為顯著。這些差異反映中文母語使用者在電影評論中,如何調節個人意見之情感極度,以進一步建立評論可信度。本研究最後採用支持向量機(Support Vector Machines)建立分類模型,並透過兩個文本情感分類實驗,在與傳統詞袋模型比較下,驗證情感相關句構組合之分類效能。在實驗(一)中,我們檢測結合語言學知識集成之情感句構,相較於包涵情感詞之傳統n元語法,是否能夠涵蓋較全面的情感相關語法信息。在實驗(二)中,我們驗證情感句構是否有助於提升傳統詞袋模型之分類效能。實驗(一)結果顯示,與包涵情感詞之傳統n元語法相比,情感句構能夠涵蓋更廣的情感詞語法特性,且能夠更有效率地編碼重要情感相關語法信息。實驗(二)也證實,當n元語法和情感詞納入分類模型時,情感句構的加入,能夠提升傳統詞袋模型之分類效能,分類表現更可達到F1指標87.80%。本研究透過語言學知識集成之構詞語法,可在普遍用於情感分類模型的暴力演算法以外,提供另一個建立分類模型之方法。

並列摘要


Sentiment analysis is one of the most commonly discussed topics in the field of Natural Language Processing. While the traditional bag-of-words approach using n-grams is generally adopted for the sentiment analysis tasks like sentiment classification, studies have suggested that features beyond bags-of-word, such as grammatical and textual features, are crucial to the classifier’s performance. In particular, this study investigates to what extent linguistically-motivated morphosyntactic patterns may contribute to the sentiment classification through analyzing their impacts on the sentiment polarity of lexical features such as sentiment words in Chinse online movie reviews. We adopt pattern grammar as our theoretical framework to qualitatively encode patterns and the Wilcoxon rank-sum test to quantitatively determine significant patterns and their sentiment preferences. Our analyses show that morphosyntactic patterns demonstrate two prominent sentiment modulation of lexical sentiment polarity: intensifying the positive lexical sentiment or mitigating the negative lexical sentiment. Our post-hoc collexeme analyses of these patterns also show that sentiment-intensifying patterns attract more positive words and that sentiment-mitigating patterns attract more negative words. These preferences reveal how Chinese speakers utilize morphosyntactic patterns to modulate the sentiment in their opinions and establish their credibility in online movies reviews. Finally, we train a series of Support Vector Machines models and perform two document classification experiments to validate the effectiveness of morphosyntactic patterns in comparison to the traditional bag-of-words models. In the first experiment, we examine whether our linguistically-motivated morphosyntactic patterns could capture comparable amount of the beyond-single-word information as opposed to the sentiment-word-embedded n-grams, which are traditional n-grams that specifically contain sentiment words. In the second experiment, we test if sentiment-modulating morphosyntactic patterns do contribute to sentiment classification on top of the traditional n-gram-based model. Results of the first experiment suggest that morphosyntactic patterns can encode a wider range of the crucial morphosyntactic properties of sentiment words more efficiently than sentiment-word-embedded n-grams. The second experiment shows that morphosyntactic patterns improved the traditional n-gram-based model comprising unigrams and bigrams. Moreover, we obtained an averaged F1 score of 87.80 when considering morphosyntactic patterns with other features such as n-grams and sentiment words in the classifier. We conclude that the handcrafted, linguistically-motivated morphosyntactic patterns can provide an alternative to the brutal n-gram methods that have been commonly employed in building classifiers for sentiment classification tasks.

參考文獻


Abdi, A., Shamsuddin, S. M., Hasan, S., & Piran, J. (2019). Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion. Information Processing & Management, 56(4), 1245–1259. doi: 10.1016/j.ipm.2019.02.018.
Agarwal, B., & Mittal, N. (2016). Prominent feature extraction for review analysis: An empirical study. Journal of Experimental & Theoretical Artificial Intelligence, 28(3), 485–498. doi: 10.1080/0952813X.2014.977830.
Agarwal, B., Sharma, V. K., & Mittal, N. (2013). Sentiment classification of review documents using phrase patterns. Proceedings of the 3rd International Conference on Advances in Computing, Communications and Informatics, 1577–1580. doi: 10.1109/ICACCI.2013.6637415.
Ahmad, K., Cheng, D., Taskaya, T., Ahmad, S., Gillam, L., Pensiri, P., Traboulsi, H., & Hippisley, A. (2006). The mood of the (financial) markets: In a corpus of words and of pictures. Corpus linguistics around the world, 17–32. doi: 10.1163/9789401202213_003.
Athanasiadou, A. (2007). On the subjectivity of intensifiers. Language Sciences, 29(4), 554–565. doi: 10.1016/j.langsci.2007.01.009.

延伸閱讀