透過您的圖書館登入
IP:3.138.113.188
  • 學位論文

發展以WordNet 為本的詞彙語意資料集

Developing a Word Sense Dataset Based on WordNet Hierarchy

指導教授 : 張俊盛

摘要


本論文提出了一個方法,利用 WordNet 和平行語料進行英語的詞義辨識,為 WordNet 提供詞義相關的翻譯及雙語例句,能夠以學習者的母語(如:中文)來輔助英語學習,研究的結果也能應用在其他詞義辨識問題中。多義詞的不同詞義在其他語言通常會被翻譯成不同的詞,我們提出的方法主要是利用此特性來決定詞義。我們的方法涉及了擷取平行語料中英文詞相互對應的翻譯,並訓練一個分類器透過不同翻譯以區分詞義,最後為不同的詞義挑選出具有代表性的例句。我們將擷取出的詞彙語意資料發展成一個搜尋系統,LanguageNet,提供查詢多義詞不同詞義的雙語同義詞及使用實例。實驗的評估結果顯示我們提出的方法在擷取詞義相關的翻譯及雙語例句有著不錯的準確性。

並列摘要


We introduce a method for disambiguating word sense based on WordNet from a parallel corpus that can be used to provide accurate sense relevant translations and bilingual examples to support word sense disambiguation, as well as assist learning English with learner's native language (e.g., Chinese). In our approach, different translations of a word determine the specificity of the senses. The method involves extracting word translations, training a classifier to distinguish words into groups of senses based on translations, and selecting sense relevant example sentences. We present a prototype system, LanguageNet that applies the proposed method to display bilingual synonyms and sense relevant examples of senses of the given word. The evaluation on a set of polymous words shows that the method has good performance finding sense relevant translations and bilingual examples.

參考文獻


Katy McAdam Michael Rundell Pavel Rychl ́y Adam Kilgarriff, Miloˇs Hus ́ak. Gdex:Automatically finding good dictionary examples in a corpus. In Janet DeCe-saris Elisenda Bernal, editor,Proceedings of the 13th EURALEX InternationalCongress, pages 425–432, Barcelona, Spain, jul 2008. Institut Universitari deLinguistica Aplicada, Universitat Pompeu Fabra. ISBN 978-84-96742-67-3.
Steven Bird, Ewan Klein, and Edward Loper.Natural language processing withPython: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”,2009.
Wei-Te Chen, Su-Chu Lin, Shu-Ling Huang, You-Shan Chung, and Keh-JiannChen. E-hownet and automatic construction of a lexical ontology. InPro-ceedings of the 23rd International Conference on Computational Linguistics:Demonstrations, pages 45–48. Association for Computational Linguistics, 2010.
Xinxiong Chen, Zhiyuan Liu, and Maosong Sun. A unified model for word senserepresentation and disambiguation. InProceedings of the 2014 Conference onEmpirical Methods in Natural Language Processing (EMNLP), pages 1025–1035,2014.
Chris Dyer, Victor Chahuneau, and Noah A. Smith. A simple, fast, and effectivereparameterization of ibm model 2. InIn Proc. NAACL, 2013.

延伸閱讀