The recent breakthrough of Deep Learning has had a great impact on science and technology. The training process of Deep Neural Network is essentially the same as the problem of approaching equilibrium in Statistical Physics. The objective of this thesis is to understand this process by utilizing tools in Information Theory. The content includes using Mutual Information to analyze the phase transition during the training process and applying the Information Bottleneck theory to understand how the training dynamics converges to its final state.