LaSER: 基於分布預測與權重調整改善不平衡資料下之半監督式學習之框架

傳統半監督學習(semi-supervised learning, SSL)的方法假設訓練資料的類別是平均分佈的，也就是說每個類別的訓練資料數量是一樣的。然而，在現實世界的資料中，多數的資料類別是不平均分佈的。這對於傳統的 SSL 演算法是一個重大挑戰，它們在這種情況下通常表現不佳，會嚴重傾向於預測訓練資料較多的類別。為了解決這個問題，有一種研究領域在探討類別不平衡資料下的半監督學習(class-imbalanced semi-supervised learning, CISSL)，讓 SSL 演算法可以減少受到不平衡資料造成的影響。我們發現在現有 CISSL 的研究中有兩種方向: (1)提高僞標籤(pseudo-label)的準確度, (2)結合 SSL 與不平衡學習(class-imbalanced learning)。這兩種研究方向解決了不同面向的問題。在本論文中，我們提出了一種結合這兩種流派的新方法，我們的方法分別結合了 DARP 和 Mixup-DRW 到現有的 SSL 演算法中。此外，我們改進了在不平衡資料下的標註分佈預測(label shift estimation, LSE)，更進一步在各種環境、設定下提高了 SSL 性能和穩定性。

關鍵字

機器學習；半監督式學習；不平衡學習；標註分佈預測；影像分類；不平衡半監督式學習

並列摘要

The field of semi-supervised learning (SSL) has traditionally relied on the assumption that the class distribution of training data is evenly distributed. However, real-world datasets often have imbalanced or long-tailed distributions. This poses a significant challenge for traditional SSL, as they tend to exhibit poor performance in such conditions. To address this problem, a variant of SSL known as class-imbalanced semi-supervised learning (CISSL) has been introduced. CISSL is specifically designed to be more robust against imbalanced data. We found there are two approaches in existing works of CISSL: (1) enhancing the quality of pseudo-labels, and (2) adapting imbalanced learning techniques to SSL. The two approaches address different aspects of the problem. In this thesis, we propose a novel method that combines two approaches, namely DARP and Mixup-DRW. Additionally, we improve the existing label shift estimation (LSE) in CISSL settings. Resulting in enhanced performance and robustness of SSL under various conditions.

並列關鍵字

machine learning ； semi-supervised learning ； imbalanced learning ； label shift estimation ； image classification ； class-imbalanced semi-supervised learning

參考文獻

[1] A. Alexandari, A. Kundaje, and A. Shrikumar. Maximum likelihood with bias corrected calibration is hard-to-beat at label shift adaptation. In International Conference on Machine Learning, pages 222–232. PMLR, 2020.

Google Scholar

[2] K. Azizzadenesheli, A. Liu, F. Yang, and A. Anandkumar. Regularized learning for domain adaptation under label shifts. In International Conference on Learning Representations, 2018.

Google Scholar

[3] D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel. Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring. In International Conference on Learning Representations, 2019.

Google Scholar

[4] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel. Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems, 32, 2019.

Google Scholar

[5] K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019.

Google Scholar

國際替代計量

LaSER: 基於分布預測與權重調整改善不平衡資料下之半監督式學習之框架

主題瀏覽