透過您的圖書館登入
IP:18.220.11.34
  • 學位論文

深度學習在自然語言處理中的泛化能力

Generalization Capability of Deep Learning in Natural Language Processing

指導教授 : 李琳山

摘要


本論文針對深度學習在自然語言處理中的泛化能力,包含跨語言與跨格式。 首先,本論文探討了大型預訓練模型的跨語言能力成因,我們控制預訓練時資料集的大小與輸入模型的序列長度來探討成因。實驗顯示,足夠大的資料集是模型取得跨語言能力的重要因素之一,使用較小的資料集進行預訓練會明顯使得模型的跨語言能力下降。另外在大量的資料集下,足夠長的輸入序列也是必要的,較短的輸入長度會使模型在下游任務的表下下降。本論文進一步的提出隱藏在大型預訓練模型中的語言表徵概念,並提出一種簡易平均動態詞向量的方法來求得此語言表徵,且使用單詞等級的無監督翻譯來證實此語言表徵可達到我們預想的效果。之後,我們提出兩種運用此語言表徵來提高下游任務表現的方法,實驗顯示此二方法在跨語言句子檢索與跨語言遷移學習上不僅有明顯的進步,且在使用簡易,不需大量的計算。 在本論文的第二部分,我們研究了如何使用無監督學習將模型在抽取式問答中所學轉移到選擇題問答上。我們利用模型在抽取式問答中所學到先驗知識取出候選集,並讓選擇題模型從此候選集中學會如何挑選正確答案。儘管候選集中不能保證正確答案且大多含有錯誤答案,實驗顯示模型依然可以從候選集中學會如何作答。我們提出的方法在 MC500 及 RACE 兩項資料集上皆較基線方法來的優異許多。

並列摘要


This thesis focuses on the Generalization Capability of Deep Learning in Natural Language Processing, especially cross-lingual and cross-format ability. In the first part, we provide an in-depth experimental study to supplement the existing literature on cross-lingual ability. We compare the cross-lingual ability of noncontextualized and contextualized representation model with the same data. We found that data size and context window size are crucial factors to the transferability. Then, we find that the representation of a language can be obtained by simply averaging the embeddings of the tokens of the language. Given this language representation, we control multilingual BERT’s output languages by manipulating the token embeddings, thus achieving unsupervised token translation. We further propose a computationally cheap but effective approach to improve the cross-lingual ability of m-BERT based on this observation. In the second part, we study the possibility of unsupervised Multiple Choices Question Answering (MCQA). The MCQA model knows that some choices have higher probabilities of being correct than the others from fundamental knowledge. The information, though very noisy, guides the training of an MCQA model. The proposed method is shown to outperform the baseline approaches on RACE and even comparable with some supervised learning approaches on MC500.

參考文獻


[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017.
[2] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. [Online]. Available: https://www.aclweb.org/anthology/D14-1162
[3] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” 2013.
[4] T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” CoRR, vol. abs/1301.3781, 2013.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in NAACL-HLT, 2019.

延伸閱讀