透過您的圖書館登入
IP:3.149.254.110
  • 期刊
  • OpenAccess

一種基於知網的語義排歧模型研究

A Study of Semantic Disambiguation Based on HowNet

摘要


本文提出了機器翻譯中句法分析的一種語義排歧模型,該模型以《知網》為主要語義知識源。《知網》是一個以漢語和英語的詞語所代表的概念為描述物件,以揭示概念與概念之間以及概念所具有的屬性之間的關係為基本內容的常識知識庫,它為我們的排歧提供了豐富的語義資訊。排歧模型結合了基於規則及基於統計的方法,應用於分析所產生的中間結構中,從“優選”的角度進行詞義及結構的排歧。 排歧模型首先利用大規模的語料庫獲取義原的同現集合,該語料庫未進行任何的語義標誌,因此獲取過程是無指導的。然後它根據轉換模板構造出義原的語義限制規則。《知網》中的詞語義項由義原組成,義項的語義限制規則可以由其構成義原的語義規則得到。 在語義排歧階段,我們首先確定輸入句的每個實義詞的上下文相關詞集。由於實義詞的語義關係在對當前句子的語法結構確定及各詞語詞義的選擇起著相當重要的作用,我們對一個句子的評價就建立在對該句中實義詞的評價基礎之上。把詞語的當前上下文相關詞集與詞語各義項的限制規則所描述語義特徵資訊進行比較,根據比較的相似度選擇最合適的義項。同時將相似度的最大值作為該詞語的評價值。中間分析結果中各實義詞的評價分值可以成為評價此中間結果的依據,以此在多個中間結構中選出最佳的結果。這樣,我們在解決詞義歧義的基礎上同時也解決了結構歧義。 本文所提出的語義排歧模型已在機器翻譯系統中具體地實現。實驗例句的測試表明該排歧模型對解決句法分析中的辭彙歧義、結構歧義是有效的,並且優於傳統的YES/NOT的方法。 本文首先提出了排歧模型的主要思想,並簡要介紹了《知網》。然後給出了從語料庫中抽取義原同現資訊及將其轉化成語義限制規則的方法。接著文章詳細介紹了排歧演算法,包括構建上下文相關詞集,義原間、語義規則和上下文詞集間的相似度計算。最後文章給出了模型的試驗實例結果。

並列摘要


This thesis presents a description of a semantic disambiguation model applied in the syntax parsing process of the machine translation system. The model uses Hownet as its main semantic resource, which is a common-sense knowledge base unveiling inter-conceptual relations and inter-attribute relations of concepts as connoting in lexicons of the Chinese and their English equivalents. It can provide rich semantic information for our disambiguation. The model makes the word sense and structure disambiguation in the way of ”preferring”. ”preferring” is applied in the results produced by the parsing process. It combines the rule-based method and statistic based method. First we extract from a large the co-occurrence information of each sense-atom. The corpus is untagged so the extracting process is unguided. We can construct restricted rules from the co-occurrence information according to certain transfer template. The semantic entry of a word in the Hownet is made of sense-atoms, so we can make out the restricted rules for each entry of any word. During the course of disambiguation, the model constructs the context-related words set for each notational word in the input sentence. The semantic collocation relations between notional words can play a very important role in the syntax structure disambiguation. Our evaluation of some candidates is based on the degree of tightness of match between notional words in the structure. We compare the context-related words set of the word in the current structure with all the restricted rules of the word in the lexicon, and find the best match. Then the entry with the best match is taken as the word’s explanation. And the degree of similarity shows how the word in the structure matches with other notional words in it, so it can be taken as the reference of the notional words. Because the discrepancy of different candidate parses of a structure, the same word has different content-related words set, and so will get different scores. We can calculate the best match according to the score of all the notional words of the sentence. In this way we can solve the most of word sense disambiguation and structural disambiguation at the same time. The semantic disambiguation model proposed in this thesis has been implemented in MTG system. Our experiment shows that the model is very effective for this purpose. And it is obviously more tolerant and much better than traditional YES or NO clear cut method. In this thesis we first put forward the general idea of the method and give a brief introduce to the Hownet Dictionary. Then we give the methods of extracting co-occurrence information for each sense-atom from the corpus and transferring this information to restricted rules. Then the algorithm of disambiguation is proposed with detail, which includes constructing context-related words set, the calculation of the similarity between atom-senses, and between restricted-rules and the context-related sets. The experiment result given in the end of the paper shows that the method is effective.

參考文獻


Resnik, Philip,Yarowsky, David.Proceedings of the SIGLEX Workshop〝Tagging Text with Lexical Semantics: What, why and how?〞.
Roth, Dana Lincoln(1998).Proceedings of the AAAI-98.
Wilks, Y.,Stevenson, M.(1998).COLING-ACL'98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics.
呂叔湘(1984)。歧義類型。中國語文。1984(5)
李涓子、黃昌寧(1999)。計算語言學文集

被引用紀錄


Hsieh, C. H. (2014). 人工智慧個人助理之設計與實作 [doctoral dissertation, National Chung Cheng University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201613585917

延伸閱讀