透過您的圖書館登入
IP:3.138.118.250
  • 學位論文

應用情感分析技術於電影評論分類與評分系統 — 以Yahoo!奇摩電影為例

Application of Sentiment Analysis Technology in Movie Reviews Opinion Classification and Ranking System -Taking Yahoo! Movie for Example

指導教授 : 鄭宇庭
本文將於2025/06/30開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


在近十年內,網際網路迅速的竄起,與80年代當時的web2.0尚未普及使用相比,人與人之間的交往模式從寫信給特定想發送的對象,至今慢慢地轉為傾向於自願性的發表以及分享個人言論於公開的網路平台或是論壇上,例如:消費者對於產品後的使用心得、經驗分享,或是針對影劇、新聞媒體的觀後評論與意見等等。往後,也隨著行動裝置越來越便利普及,當人們在無法做決定、有選擇性障礙時,往往會參考有經驗的人或是過去消費者們的想法。透過網路搜尋關鍵字,取得來自各種論壇、公開評論網站、新聞媒體以及個人部落格等等的資訊。例如:台大批踢踢實業坊、痞客邦等屬於提供各方面領域訊息的網站。如果想針對不同領域進行資訊的查詢,像是想了解電影相關的的訊息的話,例如:Yahoo!奇摩電影、IMDB這種評論網站提供的則是針對電影相關的影評、新聞文章、電影簡介等訊息給使用者。然而網際網路的盛行也進而引進企業界人士的投入,帶來有用的商業智慧,並提供有效的行銷決策。另外,對於網路使用者來說也能獲取來自四面八方的主觀評論意見,作為消費前或是觀看電影前的參考依據。 有鑒於此,本論文針對Yahoo!奇摩電影的短篇評論,設計一個專屬電影的意見情緒分類器與評論評分系統,分成訓練模型和測試集合驗證兩部分。在訓練集合部分,包含資料處理、人工擷取意見詞和屬性詞、建立相關詞庫、計算意見詞分數以及訓練模型的建立。首先,我們將訓練集合資料利用CKIP斷詞系統進行斷詞後,以人工標記的方式,蒐集帶有明顯情緒的意見詞以及電影相關的屬性詞,來建立情緒特徵詞庫,再針對訓練集評論中具有加強和否定語義的詞彙建立程度詞庫以及否定詞庫。接著,透過事前建立的意見詞庫、程度詞庫、否定詞庫,定義五種情緒特徵,分別為「極度正向」、「正向」、「中立」、「負向」、「極度負向」,針對訓練集合的評論進行特徵向量的擷取,再轉為特徵向量,透過非監督式的機器學習法SVM(Support Vector Machine),訓練出一個情緒分類模型。在測試集合驗證部分,利用訓練好的支持向量機,將評論進行正向情緒和負向情緒的分類,再將分類結果與評論網站上提供的星等分數做比較,計算出整體的正確率為85.55%以及AUC為92.55%,代表此系統有不錯的鑑別度和可信度。最後根據評論內容自動化對產生的電影評分,並且搭配電影的四大屬性類別的得分狀況,來提供給使用者在看電影前最直接且可信的參考指標。

並列摘要


In the past ten years, the Internet has rapidly sprung up. Compared with the web 2.0 that was not widely used in the 1980s, the mode of communication between people has gradually changed from writing a letter to a specific person who wanted to send to volunteering to post and share personal comments on public web platforms or forums. For example, consumers’ experience of using the products and experience sharing, or reviews and opinions on social media such as movies and news, etc. As mobile devices become more and more convenient and popular, when people are unable to make decisions and be torn between two things, they often take the thoughts from people who has experience or consumers from past. We can obtain various types information from all kinds of forums, public comment sites, media, and personal blogs, etc. PTT and PIXNET are web forums which provides information from different fields. If people wants to get information in specific fields, such as movie-related review sites like “Yahoo! Movies” and “IMDB” are reviews sites that provide comments, news, and introductions about movies for users. However, the prevalence of the Internet has also brought in the engagement of entrepreneurs, bringing useful business intelligence and providing reference for effective marketing decisions. In the light of this, we designed an opinion classification and ranking system just for movies by analyzing the short reviews on Yahoo! Movies, including training model and verification parts. In the training part of the system, it includes data processing, manual extraction of opinion words and attribute words, establishment of related corpus, calculation of opinion words scores, and establishment of training models. First, we tokenized data into words by Chinese Knowledge and Information Processing system, and then we collected opinion words with obvious emotions and movie-related attribute words to establish an emotional feature lexicon. Furthermore, we also took negative-terms and degree-terms into consideration, and built a lexicon for them. Then, we defined five sentiment features by the lexicons that we have built previously, including “extremely positive”, “positive”, “neutral”, “negative”, and “extremely negative”. We translated the emotional features into feature vectors, training a Support Vector Machine classification model to classify emotions. In the verification parts, the system classifies emotion of every social comment into positive and negative based on the Support Vector Machine emotions classification model trained in the training part. We compared the classification results with the star scores provided on the review sites, and got the 85.55% in accuracy rate and 92.55% in AUC, which represents that the system has a good discrimination. Finally, every movie is automatically scored according to the movie reviews, and the scores of the four attribute categories of the movie provides the users with the most direct and reliable reference index before watching the movie.

參考文獻


一、中文文獻
(1)CKIP中央研究院中文斷詞系,中央研究院,2012:http://ckipsvr.iis.sinica.edu.tw/。
(2)IMDB評論網站,1990:https://www.imdb.com/。
(3)李淑惠,2014,應用文字探勘技術於口碑分析之研究,東吳大學資訊管理學系碩士論文。
(4)邱鴻達,2011,意見探勘在中文電影評論之應用,國立交通大學資訊與工程研究所碩士論文。

延伸閱讀