  • 學位論文


Using the Subjective Analysis and PMI-IR in the Accuracy of Sentiment Analysis - A Case Study of Movie Comments

指導教授 : 翁頌舜


隨著網際網路蓬勃發展,使用者面臨龐大資訊量,以往依主題的分類方式已無法有效過濾資訊。搜尋引擎所回傳的大量蒐尋結果,除了無法看完之外,更可能造成搜尋者只得知部份偏頗的意見。為了幫助使用者組織大量評論,以得到更好的資訊內容,學界開始研究可自動將評論分類的方法-情感分析(Sentiment Analysis)。情感分析為文字探勘(Text Mining)技術應用的一種,主要是將文章依照文章的正負面情感進行分類。一篇評論往往夾雜著許多客觀的事實敘述,因而造成錯誤的分類結果,此情況尤其在電影評論中更為常見,電影評論因此被視為最難分類的評論(Turney, 2002)之一。因此判斷文章句子為主觀或客觀變得十分重要,如何避開評論裡客觀(劇情敘述)的部分,針對評論主觀(作者的個人觀感)進行分析,以助於情感分析之精準度,成為本研究的重點。 本研究選擇以中文電影評論進行研究,架構主要可分為兩階段:主客觀分析階段先以主客觀分析排除客觀(劇情敘述)的部分,將主觀句子抽取出來作為每篇評論的主觀代表句,情感分析階段針對每篇評論的主觀代表句進行情感分析。實驗結果證明此架構確實能提高情感分析之準確度,在使用前2000個情感分析特徵詞時,分類效果最好,並進一步觀察PMI-IR所使用的對立詞組影響。


With the Internet growing, users face a huge amount of information. Classification according to the theme of the past has been unable to filter information effectively. Search engine returns such a large number of search results that searchers cannot browse all. Moreover, it causes searchers could only get part of the views. To help users organize a large number of comments and get better information content, scholars began to study the automatic classification method of the comments - sentiment analysis. Sentiment analysis is a kind of text mining technology, it classified articles in accordance with article positive and negative emotions. Commentary is often mixed with objective facts described, resulting in erroneousclassification.This is particularly more common in movie reviews.Movie reviews are regarded as one of the most difficult category of comments (Turney, 2002). Therefore, determining the sentence of the article subjective or objective becomes very important, and how to avoid the objective (narrative story) part, only for subjective (the author's personal view) to analysis to help the accuracy of sentiment analysis, as the focus of the study. This study analysis Chinese movie reviews. The research framework is divided into two phases: analysis phase of subjective and analysis phase of emotion; First of all, we use subjective analysis to exclude the objective (narrative story) part, subjective sentences are extracted as the representative of each comment. Experimental results show the classification results is best when using the first 2000 features, show how opposition phrase affect the classification results and architecture can improve the classification results.


