應用機器學習與多辭典的中英雙語意見分析之研究

意見分析目的為利用自然語言處理的理論及運算技術，了解網路上意見文本、語句中所蘊含的主觀傾向。在中文評論裡，偶爾會出現中英文交替使用的現象。然而，以往研究中較少有同時針對不同語言共存的問題提出相關整合方法。　　本研究提出一中英雙語意見分析的方法，設計一中英雙語意見辭典，衡量各意見辭典與使用語法特徵，並且利用機器學習進行分類，最後運用特徵選取的方法得到最佳化的特徵集合。　　實驗結果顯示，意見辭典的搭配選擇會影響分類效果，使用雙語意見分析的方法於中文語料庫中時，在最佳化特徵集合後，使用21個特徵值於機器學習的整體正確率可達到交叉驗證74.98%與開放測試77.10%。除此之外，本論文亦針對英文資料在中文語料庫中的比例進行探討，結果顯示英文資料的比例越高，中英雙語意見分析的方法影響力越高。　　本論文主要貢獻為提出美妝保養專有領域意見詞、比較不同意見辭典之搭配的效果，以及證實雙語意見傾向之評估具有輔助機器學習的效果。

關鍵字

意見分析；意見探勘；情感分析；情緒辭典；機器學習

並列摘要

Opinion Analysis is a task that aims to determine the subjective orientation in contexts of expressing opinions on the Internet using computational techniques of Natural Language Processing. Posting opinions on the Internet that use bilingual expression is an occasional case in Chinese reviews. However, very little attention has been given to bilingual expression of opinion analysis in prior research. This paper proposes an approach, which focuses on bilingual opinion analysis applying multi-dictionary, machine learning and feature selection in the contexts of bilingual opinion in Chinese reviews. We found that accuracy would be strongly affected by different sets of general sentiment dictionaries. Our optimal experiment results showed that the overall performance by using 21 features of our proposed system achieved 74.98% with accuracy of cross validation and 77.10% with accuracy of open test. In addition to the experimental results, we also discovered the influential trend of our system by the variation of proportion of English data in Chinese reviews. The contributions of this paper are threefold: (1) extracting a new Chinese sentiment dictionary in the field of cosmetic reviews from our experiment, (2) comparing the influences in different sentiment dictionaries, and (3) proving that bilingual opinion analysis can facilitate the performance of machine learning in Chinese reviews.

並列關鍵字

Opinion Analysis ； Opinion Mining ； Sentiment Analysis ； Sentiment Dictionary ； Machine Learning

參考文獻

Aciar, S., Zhang, D., Simoff, S., & Debenham, J. (2007). Informed recommender: Basing recommendations on consumer product reviews. Intelligent Systems, IEEE, 22(3), 39-47.

Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, P. K. (2010). Measuring User Influence in Twitter: The Million Follower Fallacy. ICWSM, 10, 10-17.

Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

Chien, L.-F. (1997). PAT-tree-based keyword extraction for Chinese information retrieval. Paper presented at the ACM SIGIR Forum.

Choi, E., & Lee, C. (2003). Feature extraction based on the Bhattacharyya distance. Pattern Recognition, 36(8), 1703-1709.

被引用紀錄

李嘉洲（2016）。應用深度學習於財經新聞來源對股價趨勢預測之研究〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2016.00960

酆偉寬（2015）。網路情感分析對於手機應用程式評價之影響的研究〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2015.00674

郭紹德（2015）。領域響應詞典之中文意見分析研究〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2015.00393

國際替代計量

應用機器學習與多辭典的中英雙語意見分析之研究

全文下載

主題瀏覽