對鬆散化梯度的資料聚集進行通訊量優化

網路的資料傳輸是分散式深度學習在增加訓練機器數量時所會面臨到的瓶頸，而解決辦法之一是將所交換的梯度進行鬆散化的壓縮。我們發現在資料交換的過程中，由伺服器傳輸至訓練機器的資料量大小，會隨著由訓練機器所傳出的梯度間的相似性增高而減少。我們由初步實驗觀察到，只有少部分的參數會在短期的時間之中多次計算出較大的梯度。藉由此觀察，我們提出了幾種讓訓練機器選擇傳出梯度的演算法，並透過實驗驗證我們的做法可使由伺服器所傳出的資料量減少，並縮短訓練週期所需的時間，使訓練模型較傳統壓縮方式更快到達收斂。

關鍵字

平行處理；分散式系統；深度學習；鬆散化梯度

並列摘要

Communication usage is a bottleneck of scaling workers for distributed deep learning. One solution is to compress the exchanged gradients into sparse format with gradient sparsification. We found that the send cost of server, which is the aggregated size of sparse gradient, can be reduced by the gradient selection from workers. Following an observation that only a few gradients are significantly large and in a short period of time, we proposed several gradient selection algorithms based on different metrics. Experiment showed that our proposed method can reduce the aggregated size for server, and the reduction in time per iteration can make the convergence rate faster than traditional sparsification.

並列關鍵字

Parallel Processing ； Distributed Systems ； Deep Learning ； Gradient Sparsification

參考文獻

[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, 2017.

Google Scholar

[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778, 2016.

Google Scholar

[3] Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82–97, 2012.

Google Scholar

[4] Richard Zhang, Phillip Isola, and Alexei A. Efros. Colorful image colorization. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III, pages 649–666, 2016.

Google Scholar

[5] Andrej Karpathy and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. IEEE Trans. Pattern Anal. Mach. Intell., 39(4):664–676, 2017.

Google Scholar

國際替代計量

對鬆散化梯度的資料聚集進行通訊量優化

未授權

主題瀏覽