樹狀抽樣式標註成本導向主動學習演算法

Active learning is an important machine learning setup for reducing the labelling effort of humans. Although most existing works are based on a simple assumption that each labelling query has the same annotation cost, the assumption may not be realistic. That is, the annotation costs may actually vary between data instances. In addition, the costs may be unknown before making the query. Traditional active learning algorithms cannot deal with such a realistic scenario. In this work, we study annotation-cost-sensitive active learning algorithms, which need to estimate the utility and cost of each query simultaneously. We propose a novel algorithm, the cost-sensitive tree sampling(CSTS) algorithm, that conducts the two estimation tasks together and solve it with a tree-structured model motivated from hierarchical sampling, a famous algorithm for traditional active learning. By combining multiple tree-structured models, an extension of CSTS, the cost-sensitive forest sampling(CSFS) algorithm, is also proposed and discussed. Extensive experimental results using data sets with simulated and true annotation costs validate that the proposed methods are generally superior to other annotation cost-sensitive algorithms.

並列關鍵字

Machine Learning ； Active Learning ； Annotation Cost-sensitive

參考文獻

[3] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011.

[5] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine learning, 15(2):201–221, 1994.

[6] S. Dasgupta. Two faces of active learning. Theoretical computer science, 412(19):1767–1781, 2011.

[9] J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108, 1979.

[11] K.-H. Huang and H.-T. Lin. A novel uncertainty sampling algorithm for costsensitive multiclass active learning. In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2016.

國際替代計量

樹狀抽樣式標註成本導向主動學習演算法

全文下載

主題瀏覽