詞義區分是機器翻譯、資訊檢索以及摘要系統的必備工作。近年來由於計算語言學的迅速發展,詞義區分已被視為下一個需要解決的問題。在各式各樣的詞義區分類型中,只針對特定的詞彙進行詞義區分的準確率中英文皆可達到七成以上,對文件中每個字進行詞義區分的英文系統可亦可達七成的準確率,然而目前並無相當的中文詞義區分系統。 本文提出一中文詞義區分系統,運用中文詞網的特性以及所設計的啟發式規則,進行領域文件的詞義區分:系統先找出詞的領域詞義,若非領域詞則試圖從該詞所有詞義與上下文詞義之重疊判斷可能詞義,最後再考慮該詞之原型詞義。本詞義區分系統目前可針對領域文件(例如環保文件)中動詞及名詞進行詞義區分,經初步測試準確率可達百分之五十六。
Word Sense Disambiguation (WSD) is essential for language understanding systems such as machine translation, information retrieval, and summarization systems. WSD has also been considered the next crucial task to be taken. Among various WSD tasks in Chinese, the lexical sample task achieves a precision rate of more than 70% , so is the all-words task in English, but currently no Chinese all-words WSD system is available. This thesis presents a WSD system for domain texts. It is considered that POS tagging and heuristic rules can help eliminate sense ambiguity. Three heuristic rules are applied: first consider domain senses for words in the text; if no domain sense is available, identify the intended non-domain sense from the overlapping of sense definition from context words (Lesk algorithm); if the above two rules do not apply, assume prototype senses are more likely to apply to a non-domain word in a domain text. The system achieved 56% precision rate on nouns and verbs in a domain (e.g., environment-related) text.