基於GRU的序列對序列自動編碼器的神經元功能之分析

本文的目的在報告有關Seq2Seq模型的科學發現。眾所周知，由於RNN本質上具有遞歸機制，因此在神經元級別的分析會比分析DNN或CNN模型更具挑戰性。本文旨在提供神經元級的分析，以解釋為什麼基於單純GRU的Seq2Seq模型不需attention的機制即可成功地以很高的正確率、照順序輸出正確的token。我們發現了兩種神經元集合：存儲神經元和倒數神經元，分別存儲token和位置信息，通過分析這兩組神經元在各個時間點如何轉變以及它們的相互作用，我們可以揭開模型如何在正確位置產生正確token的機制。

關鍵字

GRU ；序列對序列模型；自動編碼器；神經元功能

並列摘要

The goal of this paper is to report certain scientific discoveries about a Seq2Seq model. It is known that analyzing the behavior of RNN-based models at the neuron level is considered a more challenging task than analyzing a DNN or CNN models due to their recursive mechanism in nature. This paper aims to provide neuron-level analysis to explain why a vanilla GRU-based Seq2Seq model without attention can successfully output correct tokens in the correct order with a very high accuracy. We found two types of neurons set, storage neurons and count-down neurons, storing token and position information respectively. By analyzing how these two group of neurons transform through the time step and how they interact, we can uncover the mechanism of how to produce the right tokens in the right positions.

並列關鍵字

Gated Recurrent Unit ； Sequence-to-Sequence Model ； Autoencoder ； Neurons functionalities

參考文獻

[1] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.

Google Scholar

[2] T. Mikolov, M. Karafiat, L. Burget, J. ´ Cernock ˇ y, and S. Khudanpur, “Recurrent neural network based language model,” in Eleventh annual conference of the international speech communication association, 2010.

Google Scholar

[3] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.

Google Scholar

[4] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

Google Scholar

[5] S. Sukhbaatar, J. Weston, R. Fergus et al., “End-to-end memory networks,” in Advances in neural information processing systems, 2015, pp. 2440–2448.

Google Scholar

國際替代計量

基於GRU的序列對序列自動編碼器的神經元功能之分析

全文下載

主題瀏覽