Multilingual Emotion Classifier using Unsupervised Pattern Extraction from Microblog Data

The expanding social network services have led to a collateral growth of user generated content on the web. This has led to microblogs positioning themselves as a very common and popular channel of expression. In recent years these opinions have taken greater importance, for it is understood that if analyzed and interpreted correctly they can provide information useful for multiple purposes. One of these purposeful analysis is understanding how people feel or react towards a specific topic. The most basic approaches attempt to determine if a given text is an expression of positive or negative opinion towards a given subject. A more detailed alternative to tackle this need is to classify texts into defined emotion specific categories. This has made it crucial to devise algorithms to efficiently identify the emotions expressed within the opinionated content. Traditional emotion classifiers require extracting high dimensional feature representations which become computationally expensive to process and can be counterproductive to the accuracy of a classifier. In this thesis we propose an unsupervised graph-based algorithm to extract emotion bearing patterns from microblog posts. Having the extracted patterns, a classification method is defined to efficiently identify the emotions expressed in posts without depending on a predefined emotional dictionary or ontology. The system also considers that in these global, connected networks, generated content comes from different geographic locations, cultures and languages. It then takes advantage of the pattern extraction method which enables it to perform successfully in different languages and domains. The experimental results demonstrate the proposed system can handle English, Spanish, and French tweets with accuracy, generality, adaptability and minimal supervision.

關鍵字

社群網絡；情緒

並列摘要

社群網絡服務的蓬勃發展，讓使用者會產生越來越多的社群資料，讓微網誌成為越來越普遍和流行的發聲頻道。這幾年來，這些使用者評論越來越重要，因為如果有分析並正確解釋的話，評論是多重用途的資料。社群研究中，其中一個分析是要了解人們對一個具體主題的感覺或反應。最基本的作法是衡量人們對主題是好感或是反感。比較詳細的方法是按照定義把資料分類到具體類別，如果能辨識出文句中的情緒，將能有助於各種分析。傳統的情緒分類方法的成本很大，精確度也不佳。本研究提出一個非監督式的情緒辨識技術，這個技術依賴一些圖行分析，將可以自動辨識出具有情緒概念的文字模式，而不依賴於任何預定的情緒字典或情緒本體。這個技術還可以跨地理位置、跨文化、和跨語言，只要有訓練資料就可以辨識。實驗結果證明，這個技述可以準確的處理英語、西班牙語、和法語情緒資料，也證實本技術的準確性、通用性。

並列關鍵字

emotion ； sentiment ； social networks ； microblog ； twitter ； patterns

參考文獻

[14] Prem Melville, Wojciech Gryc, and Richard D Lawrence. Sentiment analysis of

[11] Namrata Godbole, Manja Srinivasaiah, and Steven Skiena. Large-scale sentiment

[37] Pedro Henrique Calais Guerra, Adriano Veloso, Wagner Meira Jr, and Virg´ılio

[10] Mikhail Bautin, Lohit Vijayarenu, and Steven Skiena. International sentiment

[31] Meng-Hsuan Fu, Ling-Yu Chen, Kuan-Rong Lee, and Yaw-Huang Kuo. A novel

國際替代計量

Multilingual Emotion Classifier using Unsupervised Pattern Extraction from Microblog Data

主題瀏覽