基於機器學習與深度學習之情緒分析研究與實作

現今隨著深度學習技術的飛速進步，深度學習類神經網路已在電腦視覺和自然語言處理等許多領域有著革命性改變，也隨著深度學習應用的普及，漸漸融入越來越多的使用者和應用程式的智能化服務，逐漸影響人們的日常生活。相較於傳統的人工智慧，現有的深度學習方法不需要太多的專業領域知識，僅需提供問題與相對應的答案做為訓練資料，即可令電腦學習出一套深度學習模型來回答新的問題，因此在深度學習中如何蒐集與準備大量的訓練資料尤為重要。本論文提出一套基於機器學習分析大量未標注情緒資料的方法，利用少量資料訓練機器學習模型，令機器學習模型分析出大量資料的情緒類別，再利用機器學習分析出的大量資料訓練深度學習模型，並從訓練中提取出字詞的不同情緒，建立各種情緒詞庫。本論文的工作主要分成以下幾大步驟，首先，利用網路爬蟲技術蒐集網路上大量不同類別的資料，由於數位化的趨勢現今民眾會在各大網路平台上發表對議題的看法，像是PTT論壇，PTT是一個已有不同分類版的論壇，其次，因為PTT論壇的留言具有推噓功能，且有討論版對導論議題的分類，所以可以直接對蒐集到的資料給予符合議題類別的標籤與正負評的標籤，這樣就可以幫訓練資料自動上標籤。最後，透過機器學習貝氏分類器兩層分別訓練具有議題類別標籤與正負評標籤的資料後，可以得到分類議題的模型與分類正負評的模型。之後可以將來自於網路上各處的資料，再透過分類正負評的模型分類資料正負評，幫助這些來自網路各處的資料自動填上議題標籤與正負評標籤，進而去訓練各種不同的深度學習模型。本論文提出一套自動化蒐集資料，並且自動對資料進行標籤分類的演算法，透過以上方式解決深度學習訓練模型時大量人力標注資料的問題。

關鍵字

人工智慧；機器學習；情緒分析；情緒詞庫；貝氏分類器

並列摘要

Nowadays, with the rapid advancement of deep learning technology, deep learning neural networks have been revolutionized in many fields such as computer vision and natural language processing. With the popularization of deep learning applications, they have been gradually integrated into more and more services. Compared with traditional artificial intelligence, the existing deep learning methods only need to provide questions and corresponding answers as the training materials, so that the computer can learn a set of deep learning models to answer new questions. Therefore, how to collect and prepare a large amount of training data in deep learning is particularly important. This thesis proposes a method for analyzing a large amount of emotional data based on machine learning, using a small amount of labeled data to train a machine learning model, making the machine learning model being able to analyze the emotional categories of a large amount of data. The trained models will be used to train a deep learning model using a large amount of data obtained from the Internet. This work is mainly divided into the following major steps. First, the web crawler technology is used to collect a large number of different types of data from Internet. Secondly, we obtained the labeled data from PTT forum. Finally, through machine learning, various classification and emotion models are trained based on the labeled data obtained from PTT forum. This thesis proposes a technique for automatically collecting data and automatically labeling the data. Through the above methods, the problem of a large number of human labeling data when deep learning training models is solved. The extracted emotional vocabulary can be used when designing deep learning models. See alternative training methods from different aspects.

並列關鍵字

Artificial Intelligence ； Machine learning ； Sentiment analysis ； Emotional vocabulary ； Bayesian Classifier

參考文獻

[1] V. Sathya, A. Venkataramanan, A. Tiwari and D. D. P.S., "Ascertaining Public Opinion Through Sentiment Analysis," 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2019, pp. 1139-1143, doi: 10.1109/ICCMC.2019.8819738.

Google Scholar

[2] V. S. Pagolu, K. N. Reddy, G. Panda and B. Majhi, "Sentiment analysis of Twitter data for predicting stock market movements," 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), Paralakhemundi, 2016, pp. 1345-1350, doi: 10.1109/SCOPES.2016.7955659.

Google Scholar

[3] Meylan Wongkar and Apriandy Angdresey, "Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter" 2019 Fourth International Conference on Informatics andComputing(ICIC), 2019,doi:10.1109/ICIC47613.2019.8985884.

Google Scholar

[4] Merve Rumelli, Deniz Akkuş, Özge Kart and Zerrin Isik, " Sentiment Analysis in Turkish Text with Machine Learning Algorithms," 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), Izmir, Turkey, 2019, doi: 10.1109/ASYU48272.2019.8946436.

Google Scholar

國際替代計量

基於機器學習與深度學習之情緒分析研究與實作

不提供下載

主題瀏覽