利用本體論及後適配技術於產生較佳之詞及詞義分散表示法

隨著自然語言處理工作的需求增加，近年來對於較好的詞分散表示法（詞向量）及詞義分散表示法（詞義向量）的需求在增加當中。在本篇研究當中，我們先探討在詞向量中的不正常維度，然後提出結合詞向量與本體論之模型。結合的方法分為三個部分來討論：直接結合方法，支持向量迴歸方法及利用後適配方法。在詞義向量方面，我們首先提出了能夠利用文本即本體論資訊學習更好詞義向量的聯合詞義後適配模型，並且一般化提出來的模型。

關鍵字

詞向量；詞義向量；本體論；語意關聯度

並列摘要

With the increasing number of natural language processing tasks, the need for better representation of words (word embedding) and senses (sense embedding) is getting higher in recent years. In this study, we firstly discuss the problem of abnormal dimensions in word embeddings, and then propose models that combine word embedding with ontology. The combination is discussed in three ways: directly combination approach, support vector regression approach and retrofitting approach. In sense embedding, we firstly propose a joint sense retrofitting model that learns better sense embedding from contextual and ontological information, and then generalize the proposed model.

並列關鍵字

word embedding ； sense embedding ； ontology ； semantic relatedness

參考文獻

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 19–27). Association for Computational Linguistics.

Google Scholar

Artetxe, M., Labaka, G., & Agirre, E. (2016). Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2289–2294).

Google Scholar

Azzini, A., da Costa Pereira, C., Dragoni, M., & Tettamanzi, A. G. (2012). A neuro-evolutionary corpus-based method for word sense disambiguation. IEEE Intelligent Systems, 27(6), 26–35.

Google Scholar

Banjade, R., Maharjan, N., Niraula, N. B., Rus, V., & Gautam, D. (2015). Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 335–346). Springer.

Google Scholar

Bengio, Y., Delalleau, O., & Le Roux, N. (2006). Label Propagation and Quadratic Criterion. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-Supervised Learning (pp. 193–216). MIT Press.

Google Scholar

國際替代計量

利用本體論及後適配技術於產生較佳之詞及詞義分散表示法

全文下載

主題瀏覽