基於最近鄰居之排列方法於多標籤分類問題

近幾年來，在機器學習這個領域上，多標籤分類問題越來越受到大家注目。針對多標籤分類問題，本篇研究提出了一個基於最近鄰居之排列方法。利用排序模型重新定義了鄰居們的重要程度，選出哪些鄰居的標籤比較有可能是答案。若是答案的可能性越高，那些鄰居的排序就會越高。根據這個排序，我們使用加權投票的方式來決定最後的答案。關於權重值的決定方式，我們建立了一個最佳化問題。透過解最佳化問題，來尋求各個排名所對應到的權重值應該要是多少。我們從現實世界當中的各個領域收集不同的資料來做實驗。並且與其他有利用到最近鄰居的其他知名演算法做比較。從實驗結果上來看，本方法普遍都可以有不錯的表現。而對於一些問題來說，本方法的結果也略勝於其他利用最近鄰居的知名演算法。根據本篇論文的實驗結果，我們認為若能妥善地利用最近鄰居法，對於解決多標籤分類問題是很有幫助的。

關鍵字

機器學習；資料探勘；多標籤分類

並列摘要

Multi-label classification has attracted a great deal of attention in recent years. This paper presents an interesting finding, namely, being able to identify neighbors with trustable labels can significantly improve the classification accuracy. Based on this finding, we propose a k-nearest-neighbor-based ranking approach to solve the multi-label classification problem. The approach exploits a ranking model to learn which neighbor's labels are more trustable candidates for a weighted KNN-based strategy, and then assigns higher weights to those candidates when making weighted-voting decisions. The weights can then be determined by using a generalized pattern search technique. We collect several real-word data sets from various domains for the experiment. Our experiment results demonstrate that the proposed method outperforms state-of-the-art instance-based learning approaches. We believe that appropriately exploiting k-nearest neighbors is useful to solve the multi-label problem.

並列關鍵字

machine learning ； data mining ； multi-label classification

參考文獻

[1] E. Fix and J. L. Hodges, “Discriminatory analysis, nonparametric discrimination: Consistency properties,” US Air Force School of Aviation Medicine, vol. Technical Report 4, 1951.

[3] S. Le Cessie and J. C. Van Houwelingen, “Ridge estimators in logistic regression,” Applied Statistics, vol. 41, no. 1, pp. 191–201, 1992.

[6] MatthewR.Boutell,JieboLuo,XipengShen,andChristopherM.Brown,“Learning multi-label scene classification,” Pattern Recognition, vol. 37, no. 9, pp. 1757 – 1771, 2004.

[9] Min-Ling Zhang and Zhi-Hua Zhou, “Ml-knn: A lazy learning approach to multi- label learning,” Pattern Recognition, vol. 40, pp. 2038–2048, July 2007.

[10] Weiwei Cheng and Eyke Hullermeier, “Combining instance-based learning and logistic regression for multilabel classification,” Machine Learning, vol. 76, pp. 211– 225, September 2009.

國際替代計量

基於最近鄰居之排列方法於多標籤分類問題

全文下載

主題瀏覽