情緒分析乃近年來發展迅速之一熱門研究領域,旨在透過文本分析技術探討作者之意見傾向與情緒狀態。其中,以情緒詞與情緒詞典為基礎之各種方法尤為知名。然而,情緒詞之情感傾向及其行為於不同領域文本中之行為並不盡然相同。本研究聚焦於情緒詞彙於不同領域文本中之行為,對房地產、旅館、和餐廳等三種不同領域之文本進行分析,並發現部分情緒詞彙於不同領域文本中的情緒傾向非但有差異,甚至彼此衝突。此外,部分未收錄於情緒詞典中之「非情緒詞」,在特定領域中亦可能成為「領域相依」之詞彙,影響情緒分類。本研究繼而提出不同詞彙權重計算方式,將此資訊加入舊有情緒分類系統中。在使用LIBSVM的線性核函數方式,對房地產、旅館、和餐廳等三種語料使用5次交叉驗證方式進行分類。實驗結果顯示所提出之TF-S-S-IDF分類方法,結合TF-IDF、臺灣大學情感詞典,及計算語料之領域極性情感傾向程度(SO),強化領域相關及領域不相關之情緒詞之權重,通過t檢定有效提升各領域中文件分類之效能。
The researches of sentiment analysis aim at exploring the emotional state of writers. The analysis highly depends on the application domains. Analyzing sentiments of the articles in different domains may have different results. In this study, we focus on corpora from three different domains in Traditional and Simplified Chinese including real estate, hotel and restaurant, then examine the polarity degrees of vocabularies in these three domains, and propose methods to capture sentiment differences. Finally, we apply the results to sentiment classification with LIBSVM (linear kernel). The experiments show that the proposed method TF-S-S-IDF which integrates TF-IDF, NTU Sentiment Dictionary, and word sentiment orientation degree in each specific domain can effectively improve the sentiment classification performance.