本論文提出了一種給定辭典定義的主題標籤的方法,而所有主題標籤都是從同義詞詞典中提取的。 在我們的方法中,字義被轉換為向量,再通過機器學習和深度學習模型從同義詞詞典中選擇合適的主題標籤。 該方法包括自動提取特徵以將不同定義轉換為向量,自動給定相關字組成員的字義以生成訓練數據,以及自動學習如何為每個定義對主題進行分類。 在執行時,輸入定義被轉換為詞嵌入,並使用 DL 技術進行相關程度的排序。 我們提出了一個原型系統 def2topic,該系統將該方法應用於劍橋英漢詞典。評估表明,所提出系統的結果明顯優於基線系統(baseline)。
We introduce a method for learning to determine multiple topic categories for a given sense definition, where topic categories are extracted from the synonym thesaurus. In our approach, sense definitions are transformed into vectors, aimed at providing similarity measure to disambiguate synonyms in a given thesaurus in order to generate training data. The method involves automatically extracting features for converting different definitions into vectors, automatically determining the intended senses of members of a group of related words to generate training data, and automatically learning to classify definitions into topics. At run-time, input definitions are transformed into embeddings. For classification into categories, we present a prototype system, def2topic, that applies the method on Cambridge English-Chinese Dictionary. Evaluation on two sets of sense definitions shows that the system significantly outperforms the baseline.