透過您的圖書館登入
IP:18.117.135.125
  • 學位論文

中文情感分析應用於PTT之研究

Improved Chinese Sentiment Analysis Techniques for PTT Data

指導教授 : 陳景祥

摘要


許多人會在網路上撰寫文章、彼此透過文字來交流,尤其年輕世代的人更是如此,人們在彼此交流時會有情緒的產生,同時,人們在撰寫文章時或多或少會將自己的情緒融入到文章內,比如對於某事件、某議題大眾網友的看法、情緒等。台灣大學批踢踢實業坊為現今具有代表性的討論區網站之一,其眾多的人口流量、大量的子討論區、特殊的系統架構、網友互動的方式等,產生了許多熱門文章與新穎的網路用語,經常被媒體拿來當作新聞題材。網路文章有部份詞彙具有其對應的情緒,可能為正面、也可能為負面,一般來說稱之為詞彙極性。在文字探勘領域,對於詞彙極性之標注採用人工的方式最為準確,但也最花費成本。本研究採用調整PMI的方法,期望達到自動化標注詞彙極性的部份;本研究對文章情緒分析的部份採用非監督式方法,因此不需要已標記過之訓練文章,只需要具有正負面極性之詞彙、否定詞、副詞等,與句子詞性組合做搭配來建構出文章情緒模型,藉此達到分類文章情緒之目的。

並列摘要


Many modern people communicate with each other with writing articles,especially the younger generation. During communication, people show their emotions whenthey writing articles. These articles include comments on social events, issues, etc. PTT is one of today’s representative forum websites at Taiwan. Features of PTT include large population traffic, many different categories of sub-forum, a special system architecture, and the way users interact etc. Therefore, PTT also generates many popular articles and internet catchphrases, which are usually adopted and strengthened by news media. Vocabularies in internet articles have their corresponding emotions, which may be categorized as positive, negative or neutral and phrased as semantic orientations. So far, manual tagging is the most accurate way to judge the semantic orientations in text mining, with the disadvantage of higher cost. In this study, we use adjusted Pointwise Mutual Information (PMI) method to achieve auto-tagging of semantic orientations. Moreover, we use unsupervised learning method for the sentiment modeling without marked training data. With just negation words, adverb, adjective, positive and negative words etc, together with the sentence speech, we hope to achieve the purpose of classification of article’s emotions in PTT.

參考文獻


[6] 吳泳慶,「中文垃圾郵件客製化過濾系統之研究」,淡江大學統計學系碩士班學位論文 (2007): 1-62.
[1] Church, Kenneth Ward, and Patrick Hanks. “Word association norms, mutual information, and lexicography.” Computational linguistics 16.1 (1990): 22-29.
[2] Levene, Howard. “Robust tests for equality of variancesl.”Contributions to probability and statistics: Essays in honor of Harold Hotelling 2 (1960): 278-292.
[3] Royston, J. P. “Algorithm AS 181: the W test for normality. ” Journal of the Royal Statistical Society. Series C (Applied Statistics) 31.2 (1982): 176-180.
[4] Sharma, Anuj, and Shubhamoy Dey. “A comparative study of feature selection and machine learning techniques for sentiment analysis. ” Proceedings of the 2012 ACM Research in Applied Computation Symposium. ACM, (2012).

被引用紀錄


吳登揚(2017)。基於不同主題的中文情感分析比較〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2017.01083

延伸閱讀