透過您的圖書館登入
IP:3.149.251.155
  • 學位論文

基於自監督度量學習及任務導向轉換之少樣本影像分類

Few-Shot Visual Classification with Improved Self-Supervised Metric Learning and Task-Aware Transformation

指導教授 : 林嘉文

摘要


隨著深度學習的蓬勃發展以及各式資料集的日益完備,卷積神經網路 (Convolutional Neural Network) 基於大量數據的資料驅動 (Data-Driven) 可以在圖像分類的任務上相對傳統電腦視覺演算法有相當大幅的領先,但是由於蒐集資料曠日費時,又需要耗費大量成本標注資料,但導致面對新的應用時產生諸多不便。若是使用傳統的監督式學習 (Supervised Learning),但卻以少量樣本訓練一個參數量巨大的神經網路,幾乎都會有嚴重的過擬合 (Over-fitting) 的問題。 於是近幾年學者開始發展一個新的領域: 少樣本學習 (Few-Shot Learning) ,試圖在資料有限的情況下依然能夠訓練出可靠的模型。目前少樣本學習的主要路線為元學習 (Meta Learning) 架構,又能分為基於梯度 (Gradient-Based) 的元學習與基於度量空間 (Metric-Based) 的元學習。最近則有研究著重於利用自監督學習 (Self-Supervised Learning) 的機制來學習更廣泛的特徵。 而在最近的研究中,開始有些工作嘗試不再使用元學習架構,轉而使用傳統的監督式學習,並專注在如何獲取更高品質的特徵,並且發現即使不使用原學習也能獲得不俗的表現,甚至簡單的基線方式便能打敗最近一些使用架構相對複雜的元學習研究,所以令人好奇的問題是,若兩種學習方式能學到不同知識,在學習過程中同時對兩種任務優化,是否能在少樣本圖像學習上表現更好? 在此研究中,我們結合了特徵嵌入元學習、一般監督式學習與自監督學習三種學習目標,並在元學習導入一個與任務相關 (Task-Aware) 的投射機制,將樣本的特徵嵌入轉換到一個更可靠的空間進行分類。實驗證明在少樣本的分類任務的過擬合窘境上有顯著的改善,不但能大幅超越傳統方式,並且在主流資料集與不同情境設定上具有競爭性的表現。

並列摘要


With the rapid development of deep learning as well as the emerging of various data sets, convolutional neural networks based on large amounts of data have a considerable lead in visual classification compared to traditional computer vision algorithms. But data collection is time-consuming and it costs a lot to label the data, which brings inconvenience when implementing new applications. If following common supervised learning but using a small number of samples to train a complicated neural network, overfitting problem always comes to us. In recent years, scholars have begun to develop a new field: few-shot learning, trying to train a reliable model even with limited data. At present, the main route of few-shot learning is the meta-learning architecture, which can be divided into gradient-based meta-learning and metric-based meta-learning. Recently, some researchers have focused on training better-quality feature extractors, using self-supervised learning mechanisms to learn more general features. For recent few-shot classification works, researchers start to rethink the necessity of meta-learning and try to use standard supervised learning instead of meta-learning, and they find that even the simplest baseline can beat recent complex meta-learning methods. In this work, we combine meta-learning, traditional supervised learning, and self-supervised learning, furthermore, introducing a task-aware projection mechanism to transform the original feature embedding into a more reliable space for classification. Experiments show that there is a significant improvement in the over-fitting dilemma of the few-shot visual classification task.

參考文獻


[1] Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Wang, and Jia-Bin Huang. A closer look at few-shot classification. In International Conference on Learning Representations, 2019.
[2] Guneet Singh Dhillon, Pratik Chaudhari, Avinash Ravichandran, and Stefano Soatto. A baseline for few-shot image classification. In International Conference on Learning Representations, 2020.
[3] Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Perez, and Matthieu Cord. Boosting few-shot visual learning with self-supervision. In Proceedings of the IEEE International Conference on Computer Vision, 2019.
[4] Boris Oreshkin, Pau Rodr´ıguez Lopez, and Alexandre Lacoste. Tadam: Task dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing System, 2018.
[5] Limeng Qiao, Yemin Shi, Jia Li, Yaowei Wang, Tiejun Huang, and Yonghong Tian. Transductive episodic-wise adaptive metric for few-shot learning. In Proceedings of the IEEE International Conference on Computer Vision, 2019.

延伸閱讀