透過您的圖書館登入
IP:18.218.26.136
  • 學位論文

利用解耦前景與背景特徵壓縮單階段物件偵測網路

Distilling one-stage object detection network via decoupling foreground and background features

指導教授 : 顏淑惠

摘要


知識蒸餾(Knowledge distillation)是一個壓縮卷積網路的熱門方法,其主要方法是在訓練一個輕型網路的過程中,利用一個已經訓練好,擁有一定表現且大型的網路,利用大型網路的預測結果、特徵圖等資訊,去引導輕型網路學習。這樣作法所得的輕型網路效能比僅使用一般訓練的輕型網路,在同樣的計算需求下,表現得更加優秀。我們通常稱引導學習的大型網路為老師網路,被引導的輕型網路為學生網路。在大部分知識蒸餾的論文中,皆是以KL divergence判斷老師網路與學生網路對於所預測分類結果的機率分布差異程度,如果差異程度低,就認為學生網路與老師網路非常相似,有達到知識蒸餾的目的。對此我們發現了即便兩者預測的機率分布差異很低,但是彼此特徵向量的方向卻差距甚大的情況。 在本篇論文我們提出利用Cosine-similarity,以擬合兩者的前景特徵方向,藉此利用不同面向來確保兩者網路一致性,以及在知識濃縮的過程中加入自適應學習,透過分析老師網路的預測結果是否優於學生網路,以決定學生網路是否該接受老師網路的引導。

並列摘要


Knowledge distillation is a popular method for compressing convolutional networks. The main idea is to use a well-performing large-scale trained network to guide a small-scale network during training. By transferring the knowledge of the feature maps, prediction results, and other information of the large-scale network, the small-scale network can learn better. Therefore, under the same computational demands, performance of the small-scale network trained with knowledge distillation is better than that of the small-scale network trained without. We usually call the above-mentioned large-scale networks and small-scale networks as teacher networks and student networks. In recent work of knowledge distillation in classification, KL divergence of predicted probabilities between teacher and student is commonly used for measuring the difference of the prediction result. Thus, KL divergence loss can be a training guide of the student network. The smaller the better since it implies that the student network behaves similar to that of the teacher network. However, we found that it is possible that two feature vectors with very different directions but, after Softmax, they have a very small KL divergence loss. In this paper, we propose to use Cosine-similarity loss to encourage the similar direction on foreground feature vectors and KL divergence loss to constraint the similar prediction results on background for teacher model and student model. We also propose an adaptive learning strategy that student learns from teacher only when teacher performs better than student does.

參考文獻


[1] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, In NIPS, 2015.
[2] Wei Liu, Dragomir Anguelov, Dumitru Erhan. Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg, SSD: Single Shot MultiBox Detector, In ECCV 2016.
[3] Joseph Redmon and Ali Farhadi. Yolov3: An Incremental Improvement. Tech Report, 2018. arXiv:1804.02767 [cs.CV]
[4] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao. YOLOv4: Optimal Speed and Accuracy of Object Detection. Tech Report, 2020. arXiv:2004.10934[cv.CV]
[5] Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, Jian Sun. YOLOX: Exceeding YOLO Series in 2021, In CVPR, 2022.

延伸閱讀