基於知識交換之多任務學習訓練

隨著深度學習與神經網路的快速發展，許多高效能但龐大的模型被廣泛運用。然而，談及終端設備的訓練及運用，由於他們需要巨量的運算量和儲存資源，這種樣式的模型顯得不切實際。因此，有了輕量化模型解決此類問題，但是這類模型在各種任務上效能的損失是無可避免的，尤其是在多任務學習這個領域。換句話說，對輕量化模型而言，單單只靠相同的特徵參數卻要同時兼顧兩、三個以上的任務非常具有難度。本論文研究了這塊領域，並在不損失效能且在受限的運算資源下，對於同一組特徵，提出了一個可行、實際又泛用的訓練方法。進一步來說，首先，考慮到知識蒸餾的技術，論文中訓練了一連串的專家模型，這種專家模型只單獨對於某種任務特別專精，透過這些專家模型向目標模型去傳遞、教學有用的知識；其次，還考慮自蒸餾的技術，提升每個任務的效能；再者，透過這種自蒸餾的模組去做特徵融合，可以在不同任務及不同層之間分享有用的資訊，以提升目標模型的知識量。透過上述的訓練方法，目標模型在特定任務上的表現達到比原先專家模型更好，這意味著，所使用的知識交換模組確實能有效運用其他任務的特徵以利訓練。更棒的是，與原本的模型相比，論文中所提出的方法幾乎不會造成額外的運算成本及資源使用卻又能達到更好的表現。

關鍵字

深度學習；多任務學習；知識蒸餾；自蒸餾；特徵融合

並列摘要

With the steep growth of the deep learning field and the neural networks, several strong but giant models are proposed. However, when it comes to edge training or inference on local or edge devices, it becomes a big issue to deploy them in the real world because of the large number of resources and computational cost. Hence, lightweight models solve the current problem, while some performance drops are inevitable, especially those on multi-task learning. That is, it’s challenging to complete two or three jobs so well for a lightweight model with the same set of features. In the thesis, we dig into the problem and propose a practical training methodology to make a better utility for the same features without the loss of performance and generosity on multi-task training under the limited usage of resources. To be specific, first, we consider the method of knowledge distillation. We train several expert models proficient at only one task and apply them to individually teach a student model helpful knowledge in one particular field. Second, we use the method of self distillation in each task. Third, through self distillation modules, we fuse different features not only in the different layers but also in the different tasks together while training. Applying the training methodology, it turns out to be even more robust in some particular tasks, meaning that it does help and take advantage of the fusing features in another task. What is better is that we take almost the exact computational cost and the usage of the resources and get better performance compared with the original model.

並列關鍵字

deep learning ； multi-task learning ； knowledge distillation ； self-distillation ； feature fusion

參考文獻

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012.

Google Scholar

NTU IOX center project, “Intel-ntu connected context computing center,” 2018.

Google Scholar

K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, “Hybrid task cascade for instance segmentation,” in Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4969–4978.

Google Scholar

D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba, “Network dissection: Quantifying interpretability of deep visual representations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

Google Scholar

M. Long, Z. CAO, J. Wang, and P. S. Yu, “Learning multiple tasks with multilinear relationship networks,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/03e0704b5690a2dee1861dc3ad3316c9-Paper.pdf

Google Scholar

國際替代計量

基於知識交換之多任務學習訓練

全文下載

主題瀏覽