意見分析目的為利用自然語言處理的理論及運算技術,了解網路上意見文本、語句中所蘊含的主觀傾向。在中文評論裡,偶爾會出現中英文交替使用的現象。然而,以往研究中較少有同時針對不同語言共存的問題提出相關整合方法。 本研究提出一中英雙語意見分析的方法,設計一中英雙語意見辭典,衡量各意見辭典與使用語法特徵,並且利用機器學習進行分類,最後運用特徵選取的方法得到最佳化的特徵集合。 實驗結果顯示,意見辭典的搭配選擇會影響分類效果,使用雙語意見分析的方法於中文語料庫中時,在最佳化特徵集合後,使用21個特徵值於機器學習的整體正確率可達到交叉驗證74.98%與開放測試77.10%。除此之外,本論文亦針對英文資料在中文語料庫中的比例進行探討,結果顯示英文資料的比例越高,中英雙語意見分析的方法影響力越高。 本論文主要貢獻為提出美妝保養專有領域意見詞、比較不同意見辭典之搭配的效果,以及證實雙語意見傾向之評估具有輔助機器學習的效果。
Opinion Analysis is a task that aims to determine the subjective orientation in contexts of expressing opinions on the Internet using computational techniques of Natural Language Processing. Posting opinions on the Internet that use bilingual expression is an occasional case in Chinese reviews. However, very little attention has been given to bilingual expression of opinion analysis in prior research. This paper proposes an approach, which focuses on bilingual opinion analysis applying multi-dictionary, machine learning and feature selection in the contexts of bilingual opinion in Chinese reviews. We found that accuracy would be strongly affected by different sets of general sentiment dictionaries. Our optimal experiment results showed that the overall performance by using 21 features of our proposed system achieved 74.98% with accuracy of cross validation and 77.10% with accuracy of open test. In addition to the experimental results, we also discovered the influential trend of our system by the variation of proportion of English data in Chinese reviews. The contributions of this paper are threefold: (1) extracting a new Chinese sentiment dictionary in the field of cosmetic reviews from our experiment, (2) comparing the influences in different sentiment dictionaries, and (3) proving that bilingual opinion analysis can facilitate the performance of machine learning in Chinese reviews.