用於理解和比較變壓器模型的可視化分析系統

近年來，自然語言處理(NLP)技術取得了長足的進步。基於轉換器的模型在各種自然語言處理問題中表現良好。然而，一個自然語言任務可以由多個不同的模型來完成，它們的架構略有不同，例如不同的層數和注意力頭。除了量化指標作為選擇模型的依據外，很多用戶還考慮了理解模型語言的能力以及它所需要的計算資源。然而，對兩個不同層數和注意力頭的基於transformer的模型進行比較和深入的分析並不容易，因為它缺乏模型之間固有的一對一匹配。因此，當用戶為NLP 任務訓練、選擇或改進模型時，比較具有不同架構的模型是一項至關重要且具有挑戰性的任務。在本文中，我們提出了一個可視化分析系統來探索語言模型之間的差異，並幫助用戶選擇模型或找出模型可以改進的地方。我們的系統支持兩個模型的比較，用戶可以交互地探索不同模型下的特定層或頭部，並識別異同。使用我們的工具，用戶不僅可以通過模型學習到哪些語言特徵，還可以深入分析兩個不同層數和頭的基於轉換器的模型之間的細微差別。用戶的用例和反饋表明，我們的工具可以幫助人們深入了解並促進模型比較任務。

關鍵字

none

並列摘要

In recent years, natural language processing (NLP) technology has made great progress. Models based on transformers have performed well in various natural language processing problems. However, a natural language task can be done by multiple different models with slightly difference architectures, such as different number of layers and attention heads. In addition to quantitative indicators as the basis for selecting models, many users also consider the ability of understanding the language of the model and the computing resources it requires. However, comparably and deeply analyze two transformer-based models with difference number of layers and attention heads are not easy because it is lacks of the inherent one to one match between models. So comparing models with different architectures is a crucial and challenging task when users train, select or improve models for their NLP tasks. In this paper, we propose a visual analysis system to explore the differences between language models and help user to select model or find out where the model could be improve. Our system supports the comparison of two models and users can interactively explore specific layers or heads under different models and identify the similarities or differences. With our tool, users can not only what linguistic features are learned by the model, but also deeply analyze the subtle difference between two transformer-based model with different number of layers and heads. The use cases and feedback from users show that our tool can help people gain insight into and facilitate model comparison task.

並列關鍵字

Visualization ； Transformer

參考文獻

[1] Mart´ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. {TensorFlow}: a system for {Large-Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 265–283, 2016.

Google Scholar

[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

Google Scholar

[3] Hangbo Bao, Li Dong, and Furu Wei. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.

Google Scholar

[4] Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural´ probabilistic language model. The journal of machine learning research, 3:1137–1155, 2003.

Google Scholar

[5] Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi¨ Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

Google Scholar

主題瀏覽