運用自動搜索演算法整合神經網路

隨著深度學習的快速發展，業界已經開始製造深度學習相關的產品。通常產品一旦發布出去，很難再進行修改，如果當初設計的目的沒有被想像到，但隨著時間過去，需要增加新的功能，唯一能做的就是重新製造產品，這不僅需要耗費很多時間，而且成本相當昂貴。此外，在實際應用的場域中也會需要同時解決多個任務，在資源有限的設備上部署多個模型會非常具有挑戰性，因為大多數的模型需要很高的計算資源。在本論文中，我們利用多個預訓練模型之間的協同作用，加以整合成一個多工的神經網路，如此一來便可以透過微調的方式來去達到期望的功能，也可以解決在資源有限的設備上部署多個模型需要很高的計算資源的問題。然而，困難點在於如何找到最好的接點來整合模型以達到準確度以及運算成本之間的平衡，最直覺的方式是完整遍歷所有可能的架構，但是這樣非常的複雜，同時需要很龐大的計算資源且耗費時間，因此，我們提出自動搜索的演算法來找到最適當的整合網路架構，並結合了自蒸餾的技術，提高準確度。實驗結果證明了方法的有效性。

關鍵字

深度學習；集成學習；多工學習；遷移式學習；神經網路結構搜索

並列摘要

With the rapid development of deep learning, the industry has begun to manufacture products related to deep learning. The feature of the product is that once it has been released, it is difficult to modify. That is to say if the company wants to add new features, the only thing they can do is re-manufacturing the product. It is quite expensive and takes a lot of time. Moreover, simultaneously solving multiple tasks is required in real-world scenarios since a product is required to do many jobs at the same time. Deploying all of these models on resource-limited devices is challenging since most of the models require high computational resources. In this thesis, we propose to integrate multiple pre-trained models into a unified structure by exploiting the synergy among them. The challenge is how can we find the best layer to integrate models to achieve the balance between the performance and the computational cost. The intuitive way is the full search which goes through all possible architectures. However, manually design integrated models and iterative training require huge computational resources and a long search time. Thus, we further propose an Automatic Search Algorithm for Integrated Neural Network which automatically determines the integrated architecture among multiple choices in the manner of Network Architecture Search. In addition, we combine the self distillation technique with our integrated model to boost the performance. Experiment results show the effectiveness of our proposed method.

並列關鍵字

deep learning ； ensemble learning ； multi-task learning ； transfer learning ； network architecture search

參考文獻

D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba, “Network dissection:Quantifying interpretability of deep visual representations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp.6541–6549.

Google Scholar

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

Google Scholar

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

Google Scholar

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.

Google Scholar

H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.

Google Scholar

國際替代計量

運用自動搜索演算法整合神經網路

未授權

主題瀏覽