利用混合式模型聯結搭配詞與詞網詞意

在本論文中，我們提出一個混合式模型將英文搭配詞歸類到由詞網中所選取出來的詞意分類中。此混合式模型包含了基於機器學習、基於意譯、詞意頻率排序等方式。在訓練機器學習模型時，我們使用了已經標註好詞意分類的搭配詞，並利用由大型語料庫中所抽取出來的句子以及跨語言資料來幫助訓練模型。在執行時，輸入的搭配詞所對應的詞意由投票來決定，而投票的依據包含了以下幾種方式：1.基於機器學習的方式所預測的詞意；2.基於意譯的方式所預測的詞意；3.由詞意頻率排序的方式所預測的詞意；輸入的搭配詞將會被歸類到獲得最高票的詞意。實驗結果顯示，我們所使用的混合式模型比起在本論文中所比較的其他方式表現有顯著的提升，並提供了更可靠的搭配詞與詞意配對以幫助編撰字典和搭配詞學習。

關鍵字

搭配詞分類；字義解岐；詞網；最大熵模型；意譯

並列摘要

In this paper, we introduce a hybrid method to associate English collocations with sense class members chosen from WordNet. Our combinational approach includes a learning-based method, a paraphrase-based method and a sense frequency ranking method. At training time, a set of collocations with their tagged senses is prepared. We use the sentence information extracted from a large corpus and cross-lingual information to train a learning-based model. At run time, the corresponding senses of an input collocation will be decided via majority voting. The three outcomes participated in voting are as follows: 1. the result from a learning-based model; 2. the result from a paraphrase-based model; 3. the result from sense frequency ranking method. The sense with most votes will be associated with the input collocation. Evaluation shows that the hybrid model achieve significant improvement when comparing with the other method described in evaluation time. Our method provides more reliable result on associating collocations with senses that can help lexicographers in compilation of collocations dictionaries and assist learners to understand collocation usages.

並列關鍵字

collocation classification ； word sense disambiguation ； WordNet ； maximum entropy model ； Paraphrase

參考文獻

Le Z. 2004. Maximum entropy modeling toolkit for python and C. Natural Language Processing Lab, Northeastern University, China.

Lin D. 2003. Dependency-based evaluation of MINIPAR. Treebanks :317-29.

Miller GA. 1995. WordNet: A lexical database for English. Commun ACM 38(11):39-41.

Sinha R. and Mihalcea R. 2007. Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. Semantic computing, 2007. ICSC 2007. international conference on IEEE. 363 p.

Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J. 2005. Developing a robust part-of-speech tagger for biomedical text. Advances in Informatics :382-92.

國際替代計量

利用混合式模型聯結搭配詞與詞網詞意

主題瀏覽