透過您的圖書館登入
IP:3.142.196.223
  • 學位論文

數種結合詞向量與字典資源之方法用於字義相似度測量

Some approaches of combining word embedding and lexical resource for semantic relateness mesurement

指導教授 : 陳信希

摘要


本文提出三種不同的方法來處理計算語義關聯度的問題:一、去除或調整GloVe詞向量內之不正常維度來提高效能;二、利用WordNet的距離資訊與詞向量做線性組合;三、用詞向量以及十二個從WordNet擷取出來的資訊作為SVR的特徵做監督式學習。 本文在六個評測基準資料集進行了實驗,以皮爾森相關係數與斯皮爾曼相關係數計算本文的方法產生之結果與正確標記之間的相關程度,並且與三個近期提出的計算語義關聯度方法做比較。實驗結果顯示,本文的方法在多組評測基準資料集上超越了以上三個近期提出的方法。

關鍵字

語義關聯度 詞向量 WordNet GloVe Word2Vec

並列摘要


In this thesis, we propose three different approaches to measure the semantic relatedness: (1) Boost the performance of GloVe word embedding by removing ortransforming abnormal dimensions. (2) Linearly combines the path information extracted from WordNet and the word embedding. (3) Utilize word embedding and twelve linguisticinformation extracted from WordNet as features for support vector regression. We conduct our experiments on six benchmark data sets. The evaluation measurecomputes the Pearson and Spearman correlation between the output of our methods and the ground truth. We report our results together with three state-of-the-art approaches. Theexperimental results show that our methods outperform the state-of-the-art approaches in most of the benchmark data sets.

並列關鍵字

semantic relatedness word embedding WordNet GloVe Word2Vec

參考文獻


study on similarity and relatedness using distributional and wordnet-based approaches. In
Barzilay, R., & Elhadad, M. (1999). Using lexical chains for text summarization. Advances
in automatic text summarization, 111-121.
Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic
relatedness. Computational Linguistics, 32(1), 13-47.

延伸閱讀