使用機率模型實行成本導向多重分類的主動學習演算法

如何對成本導向(cost-sensitive)的多重分類法(multiclass classification)做主動學習(active learning)是一個相對較新的研究方向。對於這個問題，我們在這份論文中提出兩種專注於成本導向的主動學習策略：最大預期成本(maximum expected cost)以及最小成本差距(cost-weighted minimum margin)。這兩種策略皆可以被視為是現存非成本導向(costinsensitive)策略的延伸。實驗結果顯示，在成本導向的環境下成本導向的策略表現相當理想，性能明顯超越非成本導向的策略。實驗結果中也反映出學習資料的難易度會若干影響成本導向主動學習演算法的表現。因此在實際的主動學習的應用中，根據分析資料特性來選擇主動學習的策略是較理想的做法。

關鍵字

電腦科學；機器學習；多重分類；成本導向；主動學習演算法

並列摘要

Multiclass cost-sensitive active learning is a relatively new problem. In this thesis, we derive the maximum expected cost and cost-weighted minimum margin strategy for multiclass cost-sensitive active learning. These two strategies can be seem as the extended version of classical cost-insensitive active learning strategies. The experimental results demonstrate that the derived strategies are promising for cost-sensitive active learning. In particular, the cost-sensitive strategies outperform cost-insensitive ones on many benchmark data sets. The results also reveal how the hardness of data affects the performance of active learning strategies. Thus, in practical active learning applications, data analysis before strategy selection can be important.

並列關鍵字

Computer Science ； Machine Learning ； Multi-class Classification ； Cost-sensitive ； Active Learning

參考文獻

Knowledge discovery and data mining, 2004.

Brigham Anderson and Andrew Moore. Active learning for hidden markov models: objective

on Machine learning, 2005.

L. Breiman. Random forests. Machine Learning, 2001.

C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines, 2001.

國際替代計量

使用機率模型實行成本導向多重分類的主動學習演算法

全文下載

主題瀏覽