透過您的圖書館登入
IP:18.117.184.189
  • 學位論文

探索多標籤分類的應用

Investigations in Applying Multilabel Classification

指導教授 : 林智仁

摘要


在機器學習中,利用基準真相進行預測是自相矛盾的做法。但這般不切實際的實驗設計廣泛在圖表徵學習領域中被使用。利用圖表徵的節點分類多標籤問題中,許多著作假設每個測試數據的標籤數在預測階段為已知。實際應用中這種資訊罕為已知。我們指出這種不恰當的設計已成為此領域的標準。我們詳細調查使用不實際資訊的始末。據分析,利用不實際的資訊很可能高估預測表現。我們指出現有多標籤方法使用上的困難為造成這種情形地的可能原因。我們提出、簡單、有效而實際的多標籤方法以利未來研究。最後我們使用這次機會比較主要的圖表徵學習方法在多標籤的節點分類問題中的表現。

關鍵字

多標籤 分類

並列摘要


Prediction using the ground truth sounds like an oxymoron in machine learning. However, such an unrealistic setting was used in hundreds, if not thousands of papers in the area of finding graph representations. To evaluate the multi-label problem of node classification by using the obtained representations, many works assume that the number of labels of each test instance is known in the prediction stage. In practice such ground truth information is rarely available, but we point out that such an inappropriate setting is now ubiquitous in this research area. We detailedly investigate why the situation occurs. Our analysis indicates that with unrealistic information, the performance is likely over-estimated. To see why suitable predictions were not used, we identify difficulties in applying some multi-label techniques. For the use in future studies, we propose simple and effective settings without using practically unknown information. Finally, we take this chance to compare major graph-representation learning methods on multi-label node classification.

並列關鍵字

multi-label classification

參考文獻


K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. Theextreme classification repository: Multilabel datasets and code, 2016. URL http://manikvarma.org/downloads/XC/XMLRepository.html.
W.C. Chang, D. Jiang, H.F. Yu, C.H. Teo, J. Zhang, K. Zhong, K. Kolluri, Q. Hu,N. Shandilya, V. Ievgrafov, J. Singh, and I. S. Dhillon. Extreme multilabel learningfor semantic matching in product search. In Proceedings of the 27th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining (KDD), 2021.
S. Chanpuriya and C. Musco. InfiniteWalk: Deep network embeddings as Laplacian embeddings with a nonlinearity. In Proceedings of the 26th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDD), page 1325–1333, 2020.
B.Y. Chu, C.H. Ho, C.H. Tsai, C.Y. Lin, and C.J. Lin. Warm start for parameterselection of linear classifiers. In Proceedings of the 21th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDD), 2015. URL http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/warm-start/warm-start.pdf.
E. Faerman, F. Borutta, K. Fountoulakis, and M. W. Mahoney. LASAGNE: locality andstructure aware graph node embedding. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 246–253, 2018. .

延伸閱讀