這篇論文中我們經由修改傳統的應用在資料池環境(pool-based)的主動式取樣方法,進而提出了應用在串流式資料上的多準則主動式取樣方法。方法中利用查詢方程式(query function)來決定是否要去查詢接收到的未標記資料的真實標籤,方程式中包含開發(exploitation)與探勘(exploration)兩個取樣準則,分別利用投票委員會演算法計算資料的不確定性以及用最大似然估計計算資料的相似性。兩個準則的權重根據KL散度的計算,在每個取樣的回合作更新。我們在五組不同性質的資料上進行實驗,驗證我們的方法在修正分類器的準確度和發現未知的類別上皆有顯著的效果。
This paper presents an active learning method to solve the multi-class classification problem on the stream-based datasets. We revise the traditional multi-criteria methods, which are used on pool-based datasets but infeasible to be directly applied to stream-based environments. A query function consists of exploitation and exploration criterion is used to make a query decision for each received data. We reformulate the QBC algorithm for exploitation, and use the concept of likelihood estimation for exploration. KL-divergence is used to update the weights of criteria according to the change of data distribution. To validate the effectiveness of our method, we experiment on different datasets. The results show that our method is effective not only on sampling informative data to query, but also on discovering unseen classes even the class accounts for a small proportion of dataset.