透過您的圖書館登入
IP:18.227.114.125
  • 學位論文

以辭典定義探究語境詞嵌入

Probing Contextualized Word Embedding with Definition

指導教授 : 陳縕儂

摘要


本篇論文旨在探究語境詞嵌入的詞義訊息,並藉由人類自然語言呈現之。在語境詞嵌入提升了許多自然語言處理領域課題的同時,詞表徵本身習得的資訊為何仍究是開放性的議題。本論文探索不同的預訓練的語境詞嵌入,試究能否從中提取出合適的詞定義。 明確而言,給定一個多語意的單詞,包含這個單詞的例句,以及其在辭典裡的定義,我們想知道透過受測的語境詞嵌入,模型能產生多合適的詞定義。我們提出一個能兼容不同詞嵌入的架構,訓練其在兩個語義連續空間中的映射,亦即語境詞嵌入和詞定義表徵。此算法將原本的生成問題轉為分類問題,旨在避免自然語言生成的困難,如此大幅提升了實驗結果,同時維持能以人類自然語言呈現詞定義的性質。我們也藉由一詞多義辨別的子任務來證實所提出的架構的功效。 將問題轉換的另一個目標為能提供一更合理的方式來探究預訓練的語境詞嵌入。基於此前的研究往往透過訓練不良的解碼器產生不當的詞定義,故而無法清楚反應受測詞嵌入本身的問題。相對地,我們的架構從人寫好的辭典中取得詞定義,而我們的映射模型作為探測器來評估預訓練的語境詞嵌入的語言知識含量極不足之處。我們發現BERT模型較ELMo模型而言似含有更充足的詞義訊息,並列出兩者共有的問題。我們的觀察或有助於了解在語境詞嵌入中,何資訊被捕捉,又於何有所遺失。

關鍵字

類神經網絡 辭嵌入 消歧義

並列摘要


The main purpose of this thesis is to investigate the sense information encoded in the contextualized word representations through human-readable definition. As contextualized word embeddings have boosted many downstream NLP tasks compared with traditional static ones, what has been learned in these representations remains an open question. In this thesis, we explore different kinds of contextualized word embeddings to see if suitable definitions can be distilled from these pretrained representations. Specifically, given a multi-sensed target word to be defined, the context containing the target word, and the ground truth definition from the dictionary, we would like to see how well can the evaluated contexutalized embeddings of the target word produce acceptable definitions. We propose a framework that can well incorporate different embedding types, with the algorithm learning a mapping between two semantically continuous spaces, the space of word representations and the space of definitions. The algorithm involves a problem reformulation of the traditional definition modeling task, aiming to avoid the difficulty of natural language generation via transforming it into a classification task, which significantly improves the performance while maintaining the ability to provide human-readable definitions. The main goal of our reformulation is to provide a more reasonable manner to assess the given pre-trained contextualized word embeddings, as previously, the unsatisfying definitions generated from a crippled decoder cannot clearly reflect the problems of the evaluated word embeddings. Instead, our framework retrieves the definitions from the well-written dictionary, and our mapping serves as a emph{probe} to explore the inherent linguistic knowledge and limitations lie in the pretrained contextualized word embeddings. We found that BERT seems to be more sense-informative than ELMo, and we list some shortages of both of them. Our observation may help better understand what is captured and what is lost in the contextualized representations.

參考文獻


[1] R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural computation, vol. 1, no. 2, pp. 270–280, 1989.
[2] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2227–2237, 2018.
[3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[4] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,”
[5] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,”

延伸閱讀