透過您的圖書館登入
IP:18.117.73.187
  • 學位論文

利用常識網路建造概念階層情緒字典

Building a Concept-level Sentiment Dictionary Based on Commonsense Knowledge

指導教授 : 許永真

摘要


本研究利用現存情緒字典以及常識網路建造一個具有情緒數值的概念階層情緒字典,主要概念是利用已有情緒值的概念以及常識網路得到常識網路上其他概念的情緒數值。 情緒字典在情緒分析研究中扮演重要角色,一個常見的作法是利用情緒字典找到文章中的情緒字,並且利用情緒字典提供的資訊以及上下文的修飾詞語,決定那個情緒單元的情緒數值,最後再綜合整篇文章中的情緒單元決定整篇文章的情緒,因此情緒字典的單字量對情緒分析的結果有重大影響。 利用常識網路來擴散情緒數值是基於我們認為相關的概念會影響彼此的情緒,另外常識網路上的單元是概念,而概念比單字具有更佳的表達能力。 我提出了兩種不同的擴散模型,一個是 random-walk-like 的方式,在常識網路上傳遞情緒數值。 另外一個則是 itrative regression 模型,利用概念的特性和概念在常識網路上的鄰居的特性分布,預測原本沒有情緒數值的概念的情緒數值。 最後我結合兩者,用 iterative regression 的結果當做 random-walk-like 的起始。 除此之外,我提出了利用 polarity accuracy, Kendall $ au$ distance 以及 average-maximum ratio 來衡量情緒字典的好壞。 比起傳統使用 mean error 的評估方式,我提出的方式具有更容易收集 evaluation data 的特性。 我使用 Amazon Mechanical Turk 收集了兩個 evaluation dataset 用於評估情緒字典。 我提出的方法在這個衡量標準下,得到比現今具指標性的情緒字典SenticNet更好的結果。

並列摘要


Sentiment analysis has been a hot topic in recent years and a sentiment dictionary plays an important role in the field. A sentiment dictionary contains a set of sentiment units and the sentiment information of each unit. A common approach of sentiment analysis is using a sentiment dictionary to match the sentiment units in a document and then use the sentiment information provided by the sentiment dictionary to decide the sentiment of the document. Therefore, the vocabulary size of the dictionary has a great influence on the the result of sentiment analysis. Moreover, since concepts have better power of expressiveness than single words, I focus on building a concept-level sentiment dictionary with a large vocabulary. In order to build sentiment dictionaries with a large vocabulary size, people usually calculate the sentiment values of new units from the ones in the existing dictionaries automatically. ConceptNet was chosen for the propagation ontology based on the assumption that semantic related concepts share common sentiment. I tried random-walk-like methods for propagation. Moreover, iterative regression method and a two-step combination of the above two methods are proposed to improve the results. Instead of mean error, I proposed using polarity accuracy, Kendall $ au$ distance and average-maximum ratio to evaluate sentiment dictionaries by the evaluation data collected from Amazon Mechanical Turk. The results show that our proposed two-step method with in-link normalization achieved the best result. Moreover, it also outperforms the state-of-the-art sentiment dictionary in terms of both polarity accuracy and Kendall $ au$ distance. In particular, Kendall $ au$ distance decreases 22% relatively.

參考文獻


[5] P. Dodds and C. Danforth. Measuring the happiness of large-scale written expres- sion: Songs, blogs, and presidents. Journal of Happiness Studies, 2010.
[9] S. D. Kamvar and J. Harris. We feel fine and searching the emotional web. In Pro- ceedings of the 4th ACM international conference on Web search and data mining, 2011.
[11] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. JOURNAL OF THE ACM, 46(5):604–632, 1999.
[13] D. B. Lenat. CYC: a large-scale investment in knowledge infrastructure. Communi- cations of the ACM, 38:33–38, November 1995.
[14] H. Liu, H. Lieberman, and T. Selker. A model of textual affect sensing using real- world knowledge. In Proceedings of the 8th international conference on Intelligent user interfaces, 2003.

延伸閱讀