用壓縮篩選樹演算法處理成本導向多標籤分類問題

近年來許多真實世界的應用需要好的多標籤分類演算法,而不同的應用往往需要考慮不同的衡量標準。為此,我們針對此需求提出更一般性的架構─成本導向多標籤分類問題 (Cost-Sensitive Multi-Label Classification), 希望藉由在學習的過程中考慮成本資訊,來一般化這個需求。然而,大部分現存的演算法只能專注在最佳化部分特定的衡量標準,無法有系統的處理各種不同的標準。在此論文中,我們提出了壓縮篩選樹演算法 (Condensed Filter Tree) 來最佳化任何不同的標準。壓縮篩選樹演算法係由在成本導向多重分類問題 (Cost-Sensitive Multi-Class Classification) 裡著名的篩選樹演算法,藉由標籤冪集 (Label Powerset) 的轉化推導而來。我們透過特殊的樹狀結構設計與專注在關鍵的樹節點,成功的解決指數多的轉換類別在表現 (Representatino)、學習 (Training) 與預測 (Prediction) 中的困難。最後,在真實世界的資料上的實驗結果顯示,比起其他已提出專注於特定衡量標準的演算法,此論文中所提出的壓縮篩選樹在不同的衡量標準下皆有較好的表現。

關鍵字

機器學習；多標籤分類；成本資訊

並列摘要

Many real-world applications call for better multi-label classification algorithms in recent years and different applications often need considering different evaluation criteria. We formalize this need with a general setup, cost-sensitive multi-label classification (CSMLC), which takes the evaluation criteria into account during the learning process. Nevertheless, most existed algorithms can only focus on optimizing a few specific evaluation criteria, and cannot systematically deal with different criteria. In this paper, we propose a novel algorithm, called condensed filter tree (CFT), for optimizing any criteria in CSMLC. CFT is derived from reducing CSMLC to the famous filter tree algorithm for cost-sensitive multi- class classification via the simple label powerset approach. We successfully cope with the difficulty of having exponentially many extend-classes within the powerset for representation, training and prediction by carefully designing the tree structure and focusing on the key nodes. Experimental results across many real-world datasets validate that the pro- posed CFT algorithm results in the better performance for many general evaluation criteria when compared with existing special- purpose algorithms.

並列關鍵字

MachineLearning ； Multi-label Classification ； Cost Information

參考文獻

[2] A. Beygelzimer, J. Langford, and P. Ravikumar. Multiclass classification with filter trees. 2007.

[3] A. Beygelzimer, J. Langford, and B. Zadrozny. Weighted one- against-all. In Proceedings of the 20th National Conference on Artificial Intelligence, 2005.

[4] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi- label scene classification. Pattern Recognition, 2004.

[8] A.ElisseeffandJ.Weston.A kernel method for multi-labelled classification. In Advances in Neural Information Processing Systems 14, 2002.

[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 2008.

國際替代計量

用壓縮篩選樹演算法處理成本導向多標籤分類問題

全文下載

主題瀏覽