透過您的圖書館登入
IP:3.139.105.83
  • 學位論文

樹狀抽樣式標註成本導向主動學習演算法

Annotation Cost-sensitive Active Learning by Tree Sampling

指導教授 : 林軒田

摘要


無資料

並列摘要


Active learning is an important machine learning setup for reducing the labelling effort of humans. Although most existing works are based on a simple assumption that each labelling query has the same annotation cost, the assumption may not be realistic. That is, the annotation costs may actually vary between data instances. In addition, the costs may be unknown before making the query. Traditional active learning algorithms cannot deal with such a realistic scenario. In this work, we study annotation-cost-sensitive active learning algorithms, which need to estimate the utility and cost of each query simultaneously. We propose a novel algorithm, the cost-sensitive tree sampling(CSTS) algorithm, that conducts the two estimation tasks together and solve it with a tree-structured model motivated from hierarchical sampling, a famous algorithm for traditional active learning. By combining multiple tree-structured models, an extension of CSTS, the cost-sensitive forest sampling(CSFS) algorithm, is also proposed and discussed. Extensive experimental results using data sets with simulated and true annotation costs validate that the proposed methods are generally superior to other annotation cost-sensitive algorithms.

參考文獻


[3] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011.
[5] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine learning, 15(2):201–221, 1994.
[6] S. Dasgupta. Two faces of active learning. Theoretical computer science, 412(19):1767–1781, 2011.
[9] J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108, 1979.
[11] K.-H. Huang and H.-T. Lin. A novel uncertainty sampling algorithm for costsensitive multiclass active learning. In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2016.

延伸閱讀