透過您的圖書館登入
IP:3.22.119.251
  • 學位論文

神經元消失:影響深層神經網路之表現能力,並使其難以訓練的新現象

Vanishing Nodes: The Phenomena That Affects The Representation Power and The Training Difficulty of Deep Neural Networks

指導教授 : 林宗男
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


梯度爆炸/消失,一直被認為是訓練深層神經網路的一大挑戰。在這篇論文裡,我們發現一種被稱為「神經元消失 (Vanishing Nodes)」的新現象同樣也會使訓練更加困難。當神經網路的深度增加,神經元彼此之間的會呈現高度相關。這種行為會導致神經元之間的相似程度提高。也就是隨著神經網路變深,網路內的神經元冗餘程度會提高。我們把這個問題稱為「神經元消失 (Vanishing Nodes)」。可以藉由神經網路的相關參數來對神經元消失的程度做推算;結果可以得出神經元消失的程度與網路深度成正比、與網路寬度成反比。從數值分析的結果呈現出:在反向傳播算法的訓練下,神經元消失的現象會變得更明顯。我們也提出:神經元消失是除了梯度爆炸/消失以外,訓練深層神經網路的另一道難關。

並列摘要


It is well known that the problem of vanishing/exploding gradients creates a challenge when training deep networks. In this paper, we show another phenomenon, called vanishing nodes, that also increases the difficulty of training deep neural networks. As the depth of a neural network increases, the network's hidden nodes show more highly correlated behavior. This correlated behavior results in great similarity between these nodes. The redundancy of hidden nodes thus increases as the network becomes deeper. We call this problem "Vanishing Nodes." This behavior of vanishing nodes can be characterized quantitatively by the network parameters, which is shown analytically to be proportional to the network depth and inversely proportional to the network width. The numerical results suggest that the degree of vanishing nodes will become more evident during back-propagation training. Finally, we show that vanishing/exploding gradients and vanishing nodes are two different challenges that increase the difficulty of training deep neural networks.

參考文獻


[1] M. Abramowitz. Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables. Dover Publications, Inc., New York, NY, USA, 1974.
[2] R. Arora, A. Basu, P. Mianjy, and A. Mukherjee. Understanding deep neural net­ works with rectified linear units. ICLR, 2018.
[3] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv:1607.06450, 2016.
[4] M. Chen, J. Pennington, and S. S. Schoenholz. Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks. Proceedings of the 35th International Conference on Machine Learning, 80:873– 882, 2018.
[5] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015.

延伸閱讀