神經元消失：影響深層神經網路之表現能力，並使其難以訓練的新現象

梯度爆炸/消失，一直被認為是訓練深層神經網路的一大挑戰。在這篇論文裡，我們發現一種被稱為「神經元消失 (Vanishing Nodes)」的新現象同樣也會使訓練更加困難。當神經網路的深度增加，神經元彼此之間的會呈現高度相關。這種行為會導致神經元之間的相似程度提高。也就是隨著神經網路變深，網路內的神經元冗餘程度會提高。我們把這個問題稱為「神經元消失 (Vanishing Nodes)」。可以藉由神經網路的相關參數來對神經元消失的程度做推算；結果可以得出神經元消失的程度與網路深度成正比、與網路寬度成反比。從數值分析的結果呈現出：在反向傳播算法的訓練下，神經元消失的現象會變得更明顯。我們也提出：神經元消失是除了梯度爆炸/消失以外，訓練深層神經網路的另一道難關。

關鍵字

深度學習；梯度消失；機器學習理論；表現能力；神經網路架構；網路訓練問題；正交參數初始化；冗餘神經元；隨機矩陣

並列摘要

It is well known that the problem of vanishing/exploding gradients creates a challenge when training deep networks. In this paper, we show another phenomenon, called vanishing nodes, that also increases the difficulty of training deep neural networks. As the depth of a neural network increases, the network's hidden nodes show more highly correlated behavior. This correlated behavior results in great similarity between these nodes. The redundancy of hidden nodes thus increases as the network becomes deeper. We call this problem "Vanishing Nodes." This behavior of vanishing nodes can be characterized quantitatively by the network parameters, which is shown analytically to be proportional to the network depth and inversely proportional to the network width. The numerical results suggest that the degree of vanishing nodes will become more evident during back-propagation training. Finally, we show that vanishing/exploding gradients and vanishing nodes are two different challenges that increase the difficulty of training deep neural networks.

並列關鍵字

Deep learning ； Vanishing gradient ； Learning theory ； Representation power ； Network architecture ； Training difficulty ； Orthogonal initialization ； Node redundancy ； Random matrices

參考文獻

[1] M. Abramowitz. Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables. Dover Publications, Inc., New York, NY, USA, 1974.

Google Scholar

[2] R. Arora, A. Basu, P. Mianjy, and A. Mukherjee. Understanding deep neural net works with rectified linear units. ICLR, 2018.

Google Scholar

[3] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv:1607.06450, 2016.

Google Scholar

[4] M. Chen, J. Pennington, and S. S. Schoenholz. Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks. Proceedings of the 35th International Conference on Machine Learning, 80:873– 882, 2018.

Google Scholar

[5] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015.

Google Scholar

國際替代計量

神經元消失：影響深層神經網路之表現能力，並使其難以訓練的新現象

主題瀏覽