基於轉移學習運用有雜訊的資訊處理分類問題

一直以來，豐富且大量的正確標記資料是一件費時又費力的工程；雖然我們可以利用自動化的方法標記海量的資料，但正確性卻也令人堪憂。因此，我們提出了一個同時使用正確但少量及大量但有雜訊資料的演算法。利用被標記過兩次以上的資料當作橋梁，藉以求出每筆資料的權重以及特徵值的轉換公式。之後，我們將演算法實驗於三個人造資料及一個現實生活實用的問題：新議題下的情感傳遞預測。有別於傳統的情感預測問題，新議題下因為缺乏歷史文字資料，因此更為艱難。最後，經實驗證明，本研究提出之演算法在不同四種資料庫下表現皆優於其他各種方法。

關鍵字

轉移學習；情感預測

並列摘要

Generally qualitative condition (the accuracy of the data) and quantitative condition (the amount of data) of the data can significantly affect the quality of a supervised learning model. However, in real-world applications it might not be feasible to always assume one can obtain large amount of high-quality datasets. This research assumes the situation that there is a only small amount of accurate training data available for learning, aiming at designing a transfer-learning based approach to utilize larger amount of noisy (in terms of labels and features) training data to improve the learning quality. This problem is non-trivial because the distribution in noisy training dataset is different from that of the testing data. In this thesis, we proposed a novel transfer learning algorithm, Noise-Label Transfer Learning (NLTL), to solve the problem. We exploit the information of labels and features from accurate and noise data, transferring the features into same domain and adjusting the weights of instances for learning. The experiment result shows NLTL could outperform the existing approaches.

並列關鍵字

Transfer Learning ； Feature Transfer ； Sentiment Prediction ； Novel Topics

參考文獻

[1] Ethem Alpaydm, Introduction to Machine Learning. The MIT Press, 2004.

[6] Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj and Sašo Džeroski, “An Extensive Experimental Comparison of Methods for Multi-Label Learning”, Pattern Recognition, Vol. 45, Issue 9, pp. 3084-3104, September 2012.

[7] Min-Ling Zhang and Zhi-Hua Zhou, “ML-KNN: A Lazy Learning Approach to Multi-Label Learning”, Pattern Recognition, Vol. 40, Issue 7, pp. 2038-2048, 2007.

[9] Weiwei Cheng and Eyke Hullermeier, “Combining Instance-Based Learning and Logistic Regression for Multilabel Classification”, Machine Learning, Vol. 76, Issue 2-3, pp. 211-225, September 2009.

[10] Oliver Chapelle, Bernhard Scholkopf and Alexander Zien, Semi-Supervised Learning. Cambridge: MIT press, 2006.

國際替代計量

基於轉移學習運用有雜訊的資訊處理分類問題

全文下載

主題瀏覽