  • 學位論文


def2topic:Learning to Classify Word Sense Definitions into Topics

指導教授 : 張俊盛


本論文提出了一種給定辭典定義的主題標籤的方法,而所有主題標籤都是從同義詞詞典中提取的。 在我們的方法中,字義被轉換為向量,再通過機器學習和深度學習模型從同義詞詞典中選擇合適的主題標籤。 該方法包括自動提取特徵以將不同定義轉換為向量,自動給定相關字組成員的字義以生成訓練數據,以及自動學習如何為每個定義對主題進行分類。 在執行時,輸入定義被轉換為詞嵌入,並使用 DL 技術進行相關程度的排序。 我們提出了一個原型系統 def2topic,該系統將該方法應用於劍橋英漢詞典。評估表明,所提出系統的結果明顯優於基線系統(baseline)。


We introduce a method for learning to determine multiple topic categories for a given sense definition, where topic categories are extracted from the synonym thesaurus. In our approach, sense definitions are transformed into vectors, aimed at providing similarity measure to disambiguate synonyms in a given thesaurus in order to generate training data. The method involves automatically extracting features for converting different definitions into vectors, automatically determining the intended senses of members of a group of related words to generate training data, and automatically learning to classify definitions into topics. At run-time, input definitions are transformed into embeddings. For classification into categories, we present a prototype system, def2topic, that applies the method on Cambridge English-Chinese Dictionary. Evaluation on two sets of sense definitions shows that the system significantly outperforms the baseline.


詞義解歧 主題分類


Michele Bevilacqua and Roberto Navigli. Breaking through the 80% glass ceil- ing: Raising the state of the art in word sense disambiguation by incorporating knowledge graph information. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2854–2864, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main. 255. URL https://aclanthology.org/2020.acl-main.255.
Luyao Huang, Chi Sun, Xipeng Qiu, and Xuanjing Huang. GlossBERT: BERT for word sense disambiguation with gloss knowledge. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pages 3507–3512, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1355. URL https: //www.aclweb.org/anthology/D19-1355.
Sosuke Kobayashi. Contextual augmentation: Data augmentation by words with paradigmatic relations. pages 452–457, 01 2018. doi: 10.18653/v1/N18-2072.
Michael Lesk. Automatic sense disambiguation using machine readable dic- tionaries: How to tell a pine cone from an ice cream cone. In Proceed- ings of the 5th Annual International Conference on Systems Documentation, SIGDOC ’86, page 24–26, New York, NY, USA, 1986. Association for Com- puting Machinery. ISBN 0897912241. doi: 10.1145/318723.318728. URL https://doi.org/10.1145/318723.318728.
Roberto Navigli. Meaningful clustering of senses helps boost word sense disam- biguation performance. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 105–112, Sydney, Australia, July 2006. As- sociation for Computational Linguistics. doi: 10.3115/1220175.1220189. URL https://aclanthology.org/P06-1014.
