基於分類回饋值於資料流上進行主動式樣本選取

這篇論文中我們經由修改傳統的應用在資料池環境(pool-based)的主動式取樣方法，進而提出了應用在串流式資料上的多準則主動式取樣方法。方法中利用查詢方程式(query function)來決定是否要去查詢接收到的未標記資料的真實標籤，方程式中包含開發(exploitation)與探勘(exploration)兩個取樣準則，分別利用投票委員會演算法計算資料的不確定性以及用最大似然估計計算資料的相似性。兩個準則的權重根據KL散度的計算，在每個取樣的回合作更新。我們在五組不同性質的資料上進行實驗，驗證我們的方法在修正分類器的準確度和發現未知的類別上皆有顯著的效果。

關鍵字

主動樣本選取；串流式資料

並列摘要

This paper presents an active learning method to solve the multi-class classification problem on the stream-based datasets. We revise the traditional multi-criteria methods, which are used on pool-based datasets but infeasible to be directly applied to stream-based environments. A query function consists of exploitation and exploration criterion is used to make a query decision for each received data. We reformulate the QBC algorithm for exploitation, and use the concept of likelihood estimation for exploration. KL-divergence is used to update the weights of criteria according to the change of data distribution. To validate the effectiveness of our method, we experiment on different datasets. The results show that our method is effective not only on sampling informative data to query, but also on discovering unseen classes even the class accounts for a small proportion of dataset.

並列關鍵字

Active Learning ； Stream-based

參考文獻

"Detecting Malicious Spam Mails: An Online Machine Learning Ap-

2014, pp. 365-372.

Online Spam Filtering in Social Networks." Symposium on Network and

[3] Zhu, X., Zhang, P., Lin, X., and Shi, Y. "Active learning from data

Omaha, Nebraska, USA, 2007, pp. 757-762.

國際替代計量

基於分類回饋值於資料流上進行主動式樣本選取

未授權

主題瀏覽