透過您的圖書館登入
IP:18.188.27.20
  • 學位論文

基於少量訓練資料之情感分析研究:以電影評論為例

A Study of Sentiment Analysis on a Small Volumes of Training Data : Case Study of Movie Comments

指導教授 : 蕭瑞祥

摘要


深度學習方法在情感分析研究中越來越普及,監督式學習方法非常依賴有標記過的訓練語料,透過訓練語料的學習,使機器產生一個完善的分類器。以往遇到大量資料未標註之問題,先前研究大都是以半監督式學習或無監督式學習作為主要方法,其成效與效率未能優於監督式學習。進行監督式學習,就必須為資料以人工上標籤,然而人工標註耗費大量的人力與時間。本論文主要研究少量的標註資料訓練出預測模型,並去探討逐步減少資料量對模型訓練的影響,並利用此模型作為主要方法以預測的形式將網路上未標註的資料進行自動化標註,作為新的訓練資料進行情感分析,實作基於少量訓練資料之文字情感分析模型,並以模型指標去驗證表現差異。並以相關論文比較。 研究發現在電影二元分類中加入標註模組的深度學習模型,在模型指標的評估上都勝於其他模型。

並列摘要


Deep learning methods are becoming more and more popular in sentiment analysis research. The supervised learning method relies on the trained corpus heavily to produce a perfect classifier. In previous studies, when people encountered a large number of unlabeled data, the semi-supervised learning or unsupervised learning are suggested as the main method, but the effectiveness and efficiency are not better than supervised learning. If we labeled the data by our own self, it will take a lot of time and human resources. This research mainly studies by using a small amount of labeled data to train the prediction model, and discusses the impact of gradually reducing the amount of data on the model training and as the main method to label unlabeled data. We use the labeled data for analysis that output sentiment analysis model which based on a small amount of training data. Our study found that the movie in the binary classification, the deep learning model with labeled modules is better than other papers.

參考文獻


[1]. Karniouchina, E. V. (2011). Impact of star and movie buzz on motion picture distribution and box office revenue. International Journal of Research in Marketing, 28(1), 62-74.
[2]. B Pang, L Lee.(2008). Opinion mining and sentiment analysis.Foundations and trends in information retrieval 2 (1-2), 1-135
[3]. Strapparava, C., & Valitutti, A. (2004, May). Wordnet affect: an affective extension of wordnet. In Lrec (Vol. 4, pp. 1083-1086).
[4]. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
[5]. Pang, P., He, J., Park, J. H., Krstić, P. S., & Lindsay, S. (2011). Origin of giant ionic currents in carbon nanotube channels. ACS nano, 5(9), 7277-7283.

延伸閱讀