用於分散式邊緣推論之高傳輸效益可分離類神經網路

在這篇論文中，我們提出一個新的模型平行化方法來分割類神經網路，以達到分散式推論的效果。利用特定模型架構中的多路徑結構，我們的分割策略將路徑平分成幾個組，再以特定規則排定組間溝通程序來降低通訊量。我們也讓運算與傳輸同時進行，減少傳輸的間接成本。此外，我們採用神經網路架構搜索的方法來採樣溝通策略並尋找高精度低傳輸模型。我們所找到最佳的模型，與原模型相比能夠減少86.6% 的傳輸量而幾乎不損精度。在適當的裝置規格與模型設置之下，大型神經網路在邊緣群集的推論可透過我們的方法進行分散與加速。

關鍵字

邊緣運算；推論加速；分散式系統；神經網路架構搜索；強化學習

並列摘要

In this paper, we proposed a novel method of exploiting model parallelism to separate a neural network for distributed inferences. By taking the advantage of a multi-path structure in the architecture of specific models, our separating strategy divides the paths into several groups and schedules the communications with rule-based methods to reduce the amount of transmission data. In different groups, the computation and communication are overlapped to alleviate the overhead of data transmission. Moreover, we further adopt the method of a neural architecture search to sample communication decisions with policy and search for high-accuracy low-transmission models. The best model we found decreases 86.6% of the amount of data transmissions as compared to the original model and does not impact performance much. Under proper specifications of devices and configurations of models, inference of large neural networks on edge clusters can be distributed and accelerated with our approach.

並列關鍵字

edge computing ； inference acceleration ； distributed system ； neural architecture search ； reinforcement learning

參考文獻

[1] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.

Google Scholar

[2] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large scale distributed deep networks. In Advances in neural information processing systems, pages 1223–1231, 2012.

Google Scholar

[3] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.

Google Scholar

[4] A. Harlap, D. Narayanan, A. Phanishayee, V. Seshadri, N. Devanur, G. Ganger, and P. Gibbons. Pipedream: Fast and efficient pipeline parallel dnn training. arXiv preprint arXiv:1806.03377, 2018.

Google Scholar

[5] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.

Google Scholar

國際替代計量

用於分散式邊緣推論之高傳輸效益可分離類神經網路

全文下載

主題瀏覽