在本論文中,我們提出一個混合式模型將英文搭配詞歸類到由詞網中所選取出來的詞意分類中。此混合式模型包含了基於機器學習、基於意譯、詞意頻率排序等方式。在訓練機器學習模型時,我們使用了已經標註好詞意分類的搭配詞,並利用由大型語料庫中所抽取出來的句子以及跨語言資料來幫助訓練模型。在執行時,輸入的搭配詞所對應的詞意由投票來決定,而投票的依據包含了以下幾種方式:1.基於機器學習的方式所預測的詞意;2.基於意譯的方式所預測的詞意;3.由詞意頻率排序的方式所預測的詞意;輸入的搭配詞將會被歸類到獲得最高票的詞意。實驗結果顯示,我們所使用的混合式模型比起在本論文中所比較的其他方式表現有顯著的提升,並提供了更可靠的搭配詞與詞意配對以幫助編撰字典和搭配詞學習。
In this paper, we introduce a hybrid method to associate English collocations with sense class members chosen from WordNet. Our combinational approach includes a learning-based method, a paraphrase-based method and a sense frequency ranking method. At training time, a set of collocations with their tagged senses is prepared. We use the sentence information extracted from a large corpus and cross-lingual information to train a learning-based model. At run time, the corresponding senses of an input collocation will be decided via majority voting. The three outcomes participated in voting are as follows: 1. the result from a learning-based model; 2. the result from a paraphrase-based model; 3. the result from sense frequency ranking method. The sense with most votes will be associated with the input collocation. Evaluation shows that the hybrid model achieve significant improvement when comparing with the other method described in evaluation time. Our method provides more reliable result on associating collocations with senses that can help lexicographers in compilation of collocations dictionaries and assist learners to understand collocation usages.