預訓練模型增加語言之高效率適應方法

多語言預訓練語言模型（mPLMs）已在零樣本跨語言轉移任務中展示了顯著的能力。具體來說，它們可以僅在來源語言的任務上進行微調，然後應用於目標語言的任務。然而，對於預訓練過程中未見的低資源語言，僅依賴零樣本跨語言轉移通常會產生較差的結果。一種常見的策略是在目標語言上繼續使用遮罩預測來繼續訓練模型。但是，由於需要調整所有參數以進行語言適應，這樣的方法效率不彰。在本篇論文中，我們提出了一種更有效的解決方案：用於語言適應的軟提示微調。我們的實驗發現，通過特別設計的軟提示來調整多語言模型，使模型能夠實現對以前未見過的語言的下游任務進行有效的零樣本跨語言轉移。值得注意的是，我們發現，相對於傳統的微調方法，提示調整在兩個的文本分類的資料集上，都展現了更佳的零樣本跨語言轉移表現，同時僅利用了調整參數的百分之0.28。這些結果強調了相對於傳統微調方法，軟提示調整可以為預訓練模型提供更加有效且高效的新增語言適應。

關鍵字

自然語言理解；跨語言遷移；軟提示；輕量化微調；語言模型

並列摘要

Multilingual pre-trained language models (mPLMs) have demonstrated notable effectiveness in zero-shot cross-lingual transfer tasks. Specifically, they can be fine-tuned solely on tasks in the source language and subsequently applied to tasks in the target language. However, for low-resource languages unseen during pre-training, relying solely on zero-shot language transfer often yields sub-optimal results. One common strategy is to continue training mPLMs using mask language modeling objectives on the target language. Nonetheless, this approach can be inefficient due to the need to adjust all parameters for language adaptation. In this paper, we propose a more efficient solution: soft-prompt tuning for language adaptation. Our experiments demonstrate that with carefully designed prompts, soft-prompt tuning enables mPLMs to achieve effective zero-shot cross-lingual transfer to downstream tasks in previously unseen languages. Notably, we found that prompt tuning outperforms continuously trained baselines on two text classification benchmarks, encompassing 18 low-resource languages, while utilizing a mere 0.28% of the tuned parameters. These results underscore the superior adaptability of mPLMs to previously unseen languages afforded by soft-prompt tuning compared to traditional fine-tuning methods.

並列關鍵字

Natural Language Understanding ； Cross Lingual Transfer ； Soft Prompt ； Parameter-Efficient Fine-Tuning ； Language Model

參考文獻

[1] D. I. Adelani, M. Masiak, I. A. Azime, J. Alabi, A. L. Tonja, C. Mwase, O. Ogun- depo, B. F. P. Dossou, A. Oladipo, D. Nixdorf, C. C. Emezue, sana al azzawi, B. Sibanda, D. David, L. Ndolela, J. Mukiibi, T. Ajayi, T. Moteu, B. Odhiambo, A. Owodunni, N. Obiefuna, M. Mohamed, S. H. Muhammad, T. M. Ababu, S. A. Salahudeen, M. G. Yigezu, T. Gwadabe, I. Abdulmumin, M. Taye, O. Awoyomi, I. Shode, T. Adelani, H. Abdulganiyu, A.-H. Omotayo, A. Adeeko, A. Afolabi, A. Aremu, O. Samuel, C. Siro, W. Kimotho, O. Ogbu, C. Mbonu, C. Chukwuneke, S. Fanijo, J. Ojo, O. Awosan, T. Kebede, T. S. Sakayo, P. Nyatsine, F. Sidume, O. Yousuf, M. Oduwole, T. Tshinu, U. Kimanuka, T. Diko, S. Nxakama, S. Ni- gusse, A. Johar, S. Mohamed, F. M. Hassan, M. A. Mehamed, E. Ngabire, J. Jules, I. Ssenkungu, and P. Stenetorp. Masakhanews: News topic classification for african languages, 2023.

Google Scholar

[2] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Nee- lakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Rad- ford, I. Sutskever, and D. Amodei. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.

Google Scholar

[3] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online, July 2020. Association for Computational Linguistics.

Google Scholar

[4] A. Conneau, R. Rinott, G. Lample, A. Williams, S. Bowman, H. Schwenk, and V. Stoyanov. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium, Oct.-Nov. 2018. Association for Computational Linguistics.

Google Scholar

[5] A. Deshpande, P. Talukdar, and K. Narasimhan. When is BERT multilingual? iso- lating crucial ingredients for cross-lingual transfer. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3610–3623, Seattle, United States, July 2022. Association for Computational Linguistics.

Google Scholar

國際替代計量

預訓練模型增加語言之高效率適應方法

主題瀏覽