透過您的圖書館登入
IP:18.118.200.86
  • 學位論文

跨語言與可解釋語義表徵之學習

Learning Crosslingual and Explainable Sense-Level Word Representations

指導教授 : 陳縕儂
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本篇論文主要目的在於解決詞義表徵的問題,因為詞義才是最精確的意思單位。解決的問題主要有二,第一是學習在向量空間中有良好對齊的詞義表徵,因為這樣可以作為下游工作譬如無監督機器翻譯很好的初始化。第二是提供詞義表徵的解釋,可以讓我們更知道複雜的表徵向量代表的意思。 本篇論文的第一部分提出模組化的詞義選擇以及表徵學習模型,並且可以同時學習在向量空間中有良好對齊性質的雙語詞義表徵。模型利用了句子層級平行的語料庫以捕捉語言對的特性。模型在SCWS上以及BCWS上皆得到很好的效果,而BCWS是本篇論文提出的第一個高品質測試資料用於衡量跨語言的詞義表徵質量。 本篇論文的第二部分嘗試解決詞義表徵的可解釋問題,包括維度上以及產生的文字解釋。給定前後文以及預訓練的詞義表徵,模型先將其投射到高維度的向量空間並選出最適合前後文的維度,使得目標字的意思可以被最近鄰表示。最後,模型使用遞迴神經網路產生人類可讀的文字解釋以增進可解釋性。同時提出一個高品質的資料集用於訓練多義字表徵以及銷歧義問題。實驗結果顯示模型在BLEU分數以及人類評分上都有良好的表現。 在未來工作方面,可以嘗試使用隱含前後文句義的向量來表示句子,以得到更好的詞義選擇能力以及詞義表徵,也可以比較其與本篇論文提出之詞義表徵之差異。除此之外,嘗試解釋模型根據前文的某些字才做出當前字的字義選擇是另一種可解釋性的方向。

並列摘要


The main purpose of this thesis is to solve problems related to sense representations, for they being the basic and fine-grained semantic unit. Firstly, we try to align a set of cross-lingual sense representations in the vector space, which may serve as a good initialization point for downstream tasks such as unsupervised machine translation. Secondly, we aim to provide interpretability for sense representations based on their contexts such that we can explain the inherently dense and complicated embeddings. The first part of this thesis proposes a modularized sense induction and representation learning model that jointly learns bilingual sense embeddings that align well in the vector space, where the cross-lingual signal in the English-Chinese parallel corpus is exploited to capture the collocation and distributed characteristics in the language pair. The model is evaluated on the Stanford Contextual Word Similarity (SCWS) dataset to ensure the quality of monolingual sense embeddings. In addition, we introduce Bilingual Contextual Word Similarity (BCWS), a large and high-quality dataset for evaluating cross-lingual sense embeddings, which is the first attempt of measuring whether the learned embeddings are indeed aligned well in the vector space. The proposed approach shows the superior quality of sense embeddings evaluated in both monolingual and bilingual spaces. The second part of this thesis focuses on interpreting the sense representations for various aspects, including sense separation in the vector dimensions and definition generation. Specifically, given a context together with a target word, our algorithm first projects the target word embedding to a high-dimensional sparse vector and picks the specific dimensions that can best explain the semantic meaning of the target word by the encoded contextual information, where the sense of the target word can be indirectly inferred. Finally, our algorithm applies an RNN to generate the textual definition of the target word in the human readable form, which enables direct interpretation of the corresponding word embedding. We also introduce a large and high-quality context-definition dataset that consists of sense definitions together with multiple example sentences per polysemous word, which is a valuable resource for definition modeling [1] and word sense disambiguation. The conducted experiments show the superior performance in BLEU score and the human evaluation test. As for future work, contextualized representations can be used to encode sentence information for better sense induction and representations. Moreover, the difference between them and our proposed sense representations is also worth exploring. Finally, endow the model capacity to explain on which words it focuses such that it decide to induce the sense for current word is another potential research direction.

參考文獻


[1] T. Noraset, C. Liang, L. Birnbaum, and D. Downey, “Definition modeling: Learning to define word embeddings in natural language.,” in Proceedings of AAAI, 2017.
[2] A. Gadetsky, I. Yakubovskiy, and D. Vetrov, “Conditional generators of words definitions,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 266–271, 2018.
[3] S. Upadhyay, K.-W. Chang, M. Taddy, A. Kalai, and J. Zou, “Beyond bilingual: Multi-sense word embeddings using multilingual context,” in Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 101–110, 2017.
[4] J. Reisinger and R. J. Mooney, “Multi-prototype vector-space models of word meaning,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 109–117, Association for Computational Linguistics, 2010.
[5] A. Neelakantan, J. Shankar, A. Passos, and A. McCallum, “Efficient non-parametric estimation of multiple embeddings per word in vector space,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014.

延伸閱讀