輕量化微調的預訓練語言模型引導之系統性分析

隨著預訓練語言模型(Pre-trained Language Model)的參數量變得越來越大，輕量化微調(Parameter-Efficient Fine-tuning)顯得更為重要，但在少樣本學習(Few-Shot Learning)的情境下進行輕量化微調的效果卻遠遠不及微調整個預訓練模型。為了解決這個問題，本研究提出在進行輕量化微調前加入一個稱為「引導」(Priming)的訓練過程來強化預訓練語言模型的量化微調效果，並且在一個包含160個不同自然語言處理任務的少樣本資料集上驗證了本方法的有效性。相較於直接進行輕量化微調，經過引導的模型在ARG(Average Relative Gain)分數上達到了近30%的進步量，其表現也超越了其他的輕量化微調基石模型。除此之外，我們針對引導模型的方法進行了系統性的實驗，分析了在引導階段使用不同訓練演算法和訓練不同參數對於引導效果的影響，並找出最有效的引導方法。本研究的結果將能有效增強輕量化微調在少樣本學習上的表現，並使得大型預訓練語言模型的微調和使用更加有效率。

關鍵字

自然語言處理；附加器；輕量化微調；元學習；多任務學習；少樣本學習

並列摘要

As the parameter size of pre-trained language models (PLMs) continues to grow, parameter-efficient fine-tuning becomes more important. However, the effectiveness of parameter-efficient fine-tuning is far inferior to that of fine-tuning the entire pre-trained model in the context of few-shot learning. To address this issue, this study proposed a training process called "priming" to enhance the effectiveness of parameter-efficient fine-tuning by strengthening the pre-trained language model before performing the downstream fine-tuning. The effectiveness of this method was verified on a few-shot dataset consisting of 160 different NLP tasks. Compared to directly performing parameter-efficient fine-tuning, the primed model achieved an improvement of nearly 30% in ARG (Average Relative Gain) score and outperformed other parameter-efficient fine-tuning baselines. In addition, we conducted systematic experiments to analyze the impact of different training algorithms and different upstream trainable parameters and identify the most effective priming method. The results of this study will effectively enhance the performance of parameter-efficient fine-tuning in few-shot learning and make fine-tuning and usage of large-scale pre-trained language models more efficient.

並列關鍵字

natural language processing ； adapter ； parameter-efficient fine-tuning ； meta-learning ； multi-task learning ； few-shot learning

參考文獻

[1] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas. Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29, 2016.

Google Scholar

[2] A. Antoniou, H. Edwards, and A. Storkey. How to train your maml. arXiv preprint arXiv:1810.09502, 2018.

Google Scholar

[3] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.

Google Scholar

[4] D. Bahdanau, K. H. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, 2015.

Google Scholar

[5] T. Bansal, R. Jha, and A. McCallum. Learning to few-shot learn across diverse natural language classification tasks. arXiv preprint arXiv:1911.03863, 2019.

Google Scholar

國際替代計量

輕量化微調的預訓練語言模型引導之系統性分析

全文下載

主題瀏覽