透過您的圖書館登入
IP:18.219.112.111
  • 期刊
  • OpenAccess

基於字典釋義關聯方法的同義詞概念擷取:以《同義詞詞林(擴展版)》為例

A Definition-based Shared-concept Extraction within Groups of Chinese Synonyms: A Study Utilizing the Extended Chinese Synonym Forest

摘要


同義詞在資訊擷取與語義分類上是很重要的語料資訊,但將兩詞歸納為同義其原由則值得令人探討。從語義(sense)的觀點來說,多義詞組歸到特定同義組合中,其語義中應有與該類字詞同義集合。此類型的代表為《同義詞詞林》(梅家駒、竺一鳴、高蘊琦與殷鴻翔,1983),將漢語同義字詞區分成具結構類別。而從計算語言學方法來說,同義詞關聯需要參考語料庫中詞組的出現頻率,輔以機器學習方法來計算同義詞相似度。然而前者專家分類原則是透過語感進行,若沒有對同義詞的類別原則加以定義,則後人便會產生對同義詞的混淆。後者機器學習方法使用統計方法來辨別相似詞彙,則會缺乏語義的辨別。為了瞭解同義詞組的概念內涵,本研究提出基於辭典釋義文字的關聯計算原則,試透過計算共同擁有的釋義文字出現比率,以解析兩詞彙間所包涵之釋義概念。並且以《同義詞詞林(擴展版)》為例,從釋義義涵的角度列舉出適合詮釋該詞組的詞彙,突顯該類別所包涵的語義。最後,比較SketchEngine(Kilgarriff et al.,2004)中所取得的同義詞(similar words)之間的差異。本研究計算結果雖然會受辭典釋義內容影響,但辭典釋義內容相較於人工分類原則與統計語料庫所得的數值資料,較能從詞義上詮釋詞彙之間的共有概念。我們希望能透過釋義關聯方法更瞭解詞彙間的交集概念,亦希望能在同義詞的語義計算上,提供辭典釋義與詞條編寫上的思考。

關鍵字

同義概念 同義詞詞林 釋義 辭典

並列摘要


Synonym groups can serve as resourceful linguistic metadata for information extraction and word sense disambiguation. Nevertheless, the reasons two words can be categorized into a particular synonym group need further study, especially when no explanation is available as to why any two words are synonymous. Lexical resources, such as the Chinese Synonym Forest (or Tongyici Cilin) (Mei et al. 1983), assemble lexical items into hierarchical categories via manual categorization. Other than this, statistical measures, such as co-existing probability, have been adopted widely to verify synonymous relationships. Nevertheless, a purely statistical method does not provide description that can help interpret why such a synonymous relationship occurs. We propose a novel method for the study of shared concepts within any synonym group by comparing co-existing words in the dictionary definition of each member in the group. The co-existing words are seen as the representatives of shared concepts that can be used for interpretating any hidden meaning among members of a synonym group. We also compare our results with the thesaurus function in the Sketch Engine (Kilgarriff et al. 2004), which uses statistical data in the form of Sketch scores. The results show that our method can produce concept words according to dictionary definitions, but this method also has its limitations, as it works only with a finite number of synonyms and under limited computing resources.

參考文獻


趙逢毅、鍾曉芳(2011)。基於辭典詞彙釋義之多階層語義關聯程度計量─以「目」字部為例。中文計算語言學期刊。16(3-4),21-40。
林頌堅(2004)。基於術語抽取與術語叢集技術的主題。Computational Linguistics and Chinese Language Processing,。9(1),97-112。
曾慧馨、劉昭麟、高照明、陳克健(2002)。以構詞與相似法為本的中文動詞自動分類研究。International Journal of Computational Linguistics and Chinese Language Processing。7(1),1-28。
Thesaurus Entry,https://trac.sketchengine.co.uk/wiki/SkE/Help/PageSpecificHelp/Thesaurus, last visited 2012/6/30.
中研院斷詞系統,http://ckipsvr.iis.sinica.edu.tw/, last visited 2012/6/27.

被引用紀錄


吳孟哲(2015)。中華現代人名與稱謂之結構分析〔碩士論文,國立交通大學〕。華藝線上圖書館。https://doi.org/10.6842/NCTU.2015.00693
黃政華(2017)。發展適應性中文相似詞庫於口碑分類〔碩士論文,中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201700784

延伸閱讀