以漸進隨機標籤集解決成本導向多標籤分類問題

在真實世界中，不同的多標籤問題往往需要不同的衡量標準，因此，將衡量標準考量進演算法中成為了一項重要的課題。我們將此種問題稱為成本導向多標籤分類問題 (cost-sensitive multi-label classification)。大部分現有的方法無法處理任意的衡量標準，而其他成本導向的方法卻又有過高的時間複雜度。在此研究中，我們提出漸進隨機標籤集 (progressive random k-labelsets) 演算法以解決上述兩個問題。此演算法延伸自著名的隨機標籤集 (random k-labelsets) 演算法，因此具有與之相同的效率。此外，此方法逐步而漸進地將原始問題轉化為一系列的成本導向多元分類問題 (cost-sensitive multi-class classification)，並能處理普遍的衡量標準。實驗結果顯示，與其他特別為某些衡量標準設計的演算法相比，漸進隨機標籤集演算法的表現與之不相上下。而在其他衡量標準下，我們提出的方法顯著地優於其他方法。

關鍵字

機器學習；多標籤分類；損失函數；成本導向；標籤集；集成方法

並列摘要

Many real-world applications of multi-label classification come with different performance evaluation criteria. It is thus important to design general multi-label classification methods that can flexibly take different criteria into account. Such methods tackle the problem of cost-sensitive multi-label classification (CSMLC). Most existing CSMLC methods either suffer from high computational complexity or focus on only certain specific criteria. In this work, we propose a novel CSMLC method, named progressive random k-labelsets (PRAKEL), to resolve the two issues above. The method is extended from a popular multi-label classification method, random k-labelsets, and hence inherits its efficiency. Furthermore, the proposed method can handle general evaluation criteria by progressively transforming the CSMLC problem into a series of cost-sensitive multi-class classification problems. Experimental results demonstrate that PRAKEL is competitive with existing methods under the specific criteria they can optimize, and is superior under general criteria.

並列關鍵字

machine learning ； multi-label classification ； loss function ； cost-sensitive ； labelset ； ensemble method

參考文獻

[1] A. Beygelzimer, J. Langford, and P. Ravikumar. Multiclass classification with filter trees. Preprint, June, 2, 2007.

Google Scholar

[2] A. Beygelzimer, J. Langford, and P. Ravikumar. Error-correcting tournaments. In Proceedings of the 20th International Conference on Algorithmic Learning Theory, pages 247–262, 2009.

Google Scholar

[3] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004.

Google Scholar

[4] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011.

Google Scholar

[5] Y.-N. Chen and H.-T. Lin. Feature-aware label space dimension reduction for multilabel classification. In Advances in Neural Information Processing Systems, pages 1529–1537, 2012.

Google Scholar

國際替代計量

以漸進隨機標籤集解決成本導向多標籤分類問題

全文下載

主題瀏覽