目前情感分析的方法在情感語料庫和演算法的選擇上,並沒有絕對的選擇組合方法和邏輯,集成學習透過結合多個分類器,成為一種有更高準確度的分析預測方法。所以集成學習在改善預測分類或分析的應用越來越多,在很多不同的研究領域之中,醫學、情感分析、口碑分析、天氣預測等方面都有相關的研究。本研究採用IMDB和Hotels.com不同的口碑資料進行實驗,使用的集成學習中常用的方法,如:Bagging、Boosting和Stacking,搭配情感語料庫,如:SenticNet 2.0、SenticNet 3.0和SenticNet4.0、SentiWordNet 1.0和SentiWordNet 3.0,探討在情感分析的研究中,可以如何改善其分析預測結果的準確率,以及有集成和沒有集成的分類效果的好壞。本研究所產出的分類架構能夠協助在之後進行實驗時,選用同類型資料集的時候,能夠參考本研究的分類架構,在何種情感語料庫和分類器的組合所分類的結果會最好,用來縮減實驗的時間,不需要每一個分類器組合和情感語料庫都要進行實驗。此外,本研究採用5種情感語料庫進行實驗,其實驗結果顯示:單一的情感語料庫並不能夠表達不同資料集的情感,因此採取多個情感語料庫,並從中選取分類結果最好的,會比只採用單一情感語料庫更能夠表達消費者評論的情感。
There is no absolute selection method and logic to choose which machine learning approaches and sentiment lexicons are the best of data mining for data analysis. Ensemble learning is generally thought that it can improve the accuracy of the experiment’s analysis and prediction through combining multiple different single classifiers. Thus there are more and more applications in the fields of prediction and classification techniques in order to provide more basis to professional people when they are solving problems for medicine, sentiment analysis, weather forecast etc. In the study, we take four WOMs (IMDB, Hotels.com, TripAdvisor and Amazon) which are crawled on the internet as datasets for experiments in this paper. We will focus on the methods which are Stacking, Bagging and Boosting and how to improve the results’ accuracy of the prediction. As the results, the classification structure can help the same type dataset in the later experiments. First, by use of the framework, we can reduce the experiment time and do not need to use the all of the combination. Second, we use five kinds of sentiment lexicons, it shows that the single sentiment lexicon can not express the real sentiment of the different dataset. Therefore, it is better to use the multiple sentiment lexicons than using the single one for all domains.