一個以集成為基礎的口碑情感分類框架

目前情感分析的方法在情感語料庫和演算法的選擇上，並沒有絕對的選擇組合方法和邏輯，集成學習透過結合多個分類器，成為一種有更高準確度的分析預測方法。所以集成學習在改善預測分類或分析的應用越來越多，在很多不同的研究領域之中，醫學、情感分析、口碑分析、天氣預測等方面都有相關的研究。本研究採用IMDB和Hotels.com不同的口碑資料進行實驗，使用的集成學習中常用的方法，如：Bagging、Boosting和Stacking，搭配情感語料庫，如：SenticNet 2.0、SenticNet 3.0和SenticNet4.0、SentiWordNet 1.0和SentiWordNet 3.0，探討在情感分析的研究中，可以如何改善其分析預測結果的準確率，以及有集成和沒有集成的分類效果的好壞。本研究所產出的分類架構能夠協助在之後進行實驗時，選用同類型資料集的時候，能夠參考本研究的分類架構，在何種情感語料庫和分類器的組合所分類的結果會最好，用來縮減實驗的時間，不需要每一個分類器組合和情感語料庫都要進行實驗。此外，本研究採用5種情感語料庫進行實驗，其實驗結果顯示：單一的情感語料庫並不能夠表達不同資料集的情感，因此採取多個情感語料庫，並從中選取分類結果最好的，會比只採用單一情感語料庫更能夠表達消費者評論的情感。

關鍵字

情感分析；口碑；集成學習

並列摘要

There is no absolute selection method and logic to choose which machine learning approaches and sentiment lexicons are the best of data mining for data analysis. Ensemble learning is generally thought that it can improve the accuracy of the experiment’s analysis and prediction through combining multiple different single classifiers. Thus there are more and more applications in the fields of prediction and classification techniques in order to provide more basis to professional people when they are solving problems for medicine, sentiment analysis, weather forecast etc. In the study, we take four WOMs (IMDB, Hotels.com, TripAdvisor and Amazon) which are crawled on the internet as datasets for experiments in this paper. We will focus on the methods which are Stacking, Bagging and Boosting and how to improve the results’ accuracy of the prediction. As the results, the classification structure can help the same type dataset in the later experiments. First, by use of the framework, we can reduce the experiment time and do not need to use the all of the combination. Second, we use five kinds of sentiment lexicons, it shows that the single sentiment lexicon can not express the real sentiment of the different dataset. Therefore, it is better to use the multiple sentiment lexicons than using the single one for all domains.

並列關鍵字

Sentiment analysis ； Word-of-mouth ； Ensemble learning

參考文獻

陳梅鳳. (2009). 單一與多專家銷售預測模型比較. 中原大學資訊管理研究所學位論文.

林政輝. (2010). 以口碑為基礎之個人化餐廳推薦機制. 中原大學資訊管理研究所學位論文.

陳勁宏. (2009). 一個採用選擇性集成的零售商品預測模型. 中原大學資訊管理研究所學位論文.

陳軒正. (2013). 以SentiWordNet為基礎建構具領域特性之情感詞彙庫. 中原大學資訊管理研究所學位論文.

洪智力, & 林政輝. (2010). 部落格本文自動萃取機制. Electronic Commerce Studies, 8(4), 457–472.

國際替代計量

一個以集成為基礎的口碑情感分類框架

全文下載

主題瀏覽