透過您的圖書館登入
IP:3.15.226.173
  • 學位論文

以循環分類鍊處理多標籤分類問題

Cyclic Classifier Chain for Multilabel Classification

指導教授 : 林軒田

摘要


本論文提出循環分類鍊(Cyclic Classifier Chain),用以有效率地解決一個真實世界常見的問題--多標籤分類問題。循環分類鍊的想法是循環地訓練多個分類鍊(Classifier Chain)。這樣做可以降低標籤順序的影響,因為原本的分類鍊在訓練時,只能拿前面已經訓練完的標籤當作特徵加入訓練,而循環分類鍊可以讓分類器取得所有標籤的估計值。我們的實驗證明循環分類鍊在訓練過幾個循環後,就可以有效降低不同標籤順序對結果的影響,且結果比分類鍊好。受到壓縮篩選樹演算法(Condensed Filter Tree)的啟發,我們也把循環分類鍊擴展來通用地解決成本導向多標籤分類問題(cost-sensitive multiclass classification)。不同的問題有不同的成本計算方式是非常常見的,因此成本導向多標籤分類問題是個重要的問題。因為分類器在訓練時可以知道其他標籤,所以我們可以計算一個標籤被分類成各種可能的分類時的成本,再將這個成本嵌入在樣本的比重裡。 這個方法可以最小化任何基於單一樣本的成本,而我們的實驗也證明這樣做能比另一個成本導向的方法,機率分類鍊(Probabilistic Classifier Chain)還要好。而與壓縮篩選樹也不相上下,甚至略贏。為了讓預測更加穩定,我們還提出梯度增強(Gradient Boosting)版本的循環分類鍊。這方法比循環分類鍊好一點,也可以被有效率地訓練。我們也試著使用回歸樹(Regression Tree)當作基礎學習者,提出一個非線性的方法來解決通用成本導向多標籤分類問題。而壓縮篩選樹因為訓練的時間複雜度太高,所以較難用非線性方法來訓練。

並列摘要


We propose a novel method, Cyclic Classifier Chain (CCC), for multilabel classification. CCC extends the classic Classifier Chain (CC) method by cyclically training multiple chains of labels. Three benefits immediately follow the cyclic design. First, CCC resolves the critical issue of label ordering in CC, and therefore reaches more stable performance. Second, CCC matches the task of cost-sensitive multilabel classification, an important problem for satisfying application needs. The cyclic aspect of CCC allows estimating all labels during training, and such estimates makes it possible to embed the cost information into weights of labels. Experimental results justify that cost-sensitive CCC can be superior to state-of-the-art cost-sensitive multilabel classification methods. Third, CCC can be easily coupled with gradient boosting to inherit the advantages of ensemble learning. In particular, gradient boosted CCC efficiently reaches promising performance for both linear and non-linear base learners. The three benefits, stability, cost-sensitivity and efficiency make CCC a competitive method for real-world applications.

參考文獻


[3] Forrest Briggs, Yonghong Huang, Raviv Raich, Konstantinos Eftaxias, Zhong Lei, William Cukierski, Sarah Frey Hadley, Adam Hadley, Matthew Betts, Xiaoli Z. Fern, Jed Irvine, Lawrence Neal, Anil Thomas, Gábor Fodor, Grigorios Tsoumakas, Hong Wei Ng, Thi Ngoc Tho Nguyen, Heikki Huttunen, Pekka Ruusuvuori, Tapio Manninen, Aleksandr Diment, Tuomas Virtanen, Julien Marzat, Joseph Defretin, Dave Callender, Chris Hurlburt, Ken Larrey, and Maxim Milakov. The 9th annual MLSP competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2013, Southampton, United Kingdom, September 22-25, 2013, pages 1–8, 2013.
[4] Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004.
[6] Zafer Barutçuoglu, Robert E. Schapire, and Olga G. Troyanskaya. Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7):830–836, 2006.
[7] Min-Ling Zhang and Zhi-Hua Zhou. ML-KNN: A lazy learning approach to multilabel learning. Pattern Recognition, 40(7):2038–2048, 2007.
[8] Yao-Nan Chen and Hsuan-Tien Lin. Feature-aware label space dimension reduction for multi-label classification. In Advances in Neural Information Processing Systems: Proceedings of the 2012 Conference (NIPS), pages 1538–1546, December 2012.

延伸閱讀