透過您的圖書館登入
IP:18.217.60.35
  • 期刊
  • OpenAccess

《現代漢語新詞語資訊電子詞典》的研究與實現

Development and Study of the "Modern Chinese New Words Information Electronic Dictionary"

摘要


本文從四個方面介紹了我們正在開發中的《現代漢語新詞語資訊電子詞典》:(1)現代漢語新詞語的界定,(2)新詞語詞典的開發思想,(3)新詞語的採集與新詞語屬性資訊的描述,(4)近四萬新詞語的歸類實踐。我們認定的新詞語是指1978年以來通過各種途徑產生的、具有基本詞彙沒有的新形式、新意義或新用法的語文詞語。除了詞形、詞義或用法任何一個方面“新”外,還要求必須是人們日常生活中普遍、廣泛使用的語文詞語,人名、地名以及專科術語都不屬於我們所說的“新詞語”。我們堅持開放的原則,儘量全面的採集收錄新詞語,用人機兩用的研究理念,以北京大學計算語言學研究所的《現代漢語語法資訊詞典》為模型打造一部收詞全面、資訊豐富、資源高度共用的現代漢語新詞語電子詞典,為新詞語的研究、中文資訊處理的研究提供一個寶貴的資源。目前已收錄新詞語近4萬,首先我們按照現代漢語詞類的“優勢語法”功能,給這四萬新詞語分類並歸類,然後,利用成熟的關聯資料庫(在ACCESS環境下實現)詳細地描述了每個詞語的屬性資訊。設立總庫一個,語法資訊庫三個,包括名詞庫、動詞庫、形容詞庫,另外還設立了構詞法庫,舊詞庫、外來詞庫、簡略詞庫。總庫和其他各庫通過“詞語、拼音、義項”三個欄位聯繫起來,構成了一個具有上下位關係的有機系統,便於資訊的提取。這些庫總共設立屬性欄位200多個,包括每個詞語的語音資訊、語義資訊、來源資訊、構詞法資訊、句法資訊和部分語用資訊。本詞典是目前國內收詞量最大、描寫資訊最多的一部新詞語詞典。

並列摘要


We introduce the development of the Electronic Lexicon of Contemporary Newborn Chinese Words: (1) the definition of a newborn word, (2) the main principle behind constructing the lexicon, (3) the collection of newborn words and their feature descriptions of them, and (4) the classification of 40,000 newborn words. In our opinion, a new bornword is a character string that appeared after 1978 in a new form, with a new meaning and with a new usage. In addition, it must be frequently used and accepted, but the names of men and places are not newborn words according to our definition. The approach to collecting newborn words is quite unrestricted, that is, the more the better. Based on the Contemporary Chinese Grammatical Knowledge Base of the Institute of Computational Linguistics at Peking University, we have finished compiling a lexicon of almost 40,000 newborn words semi-automatically. The lexicon, we believe, is a worthy resource for research on Chinese word-building rules and Natural Language Processing. Firstly, classification is done based on the preponderant grammatical characteristics of each word, and then the detailed features are described in the database of ACCESS. The lexicon contains a total base and three grammatical bases (i.e., a noun base, verb base. and adjective base); what's more, it also has an old word base, a loanword base and a acronym base. The entire base is related to the sub-bases through the fields of word, phonetic notation and semantics fields, which form a hypernymy hierarchy that is quite convenient for searching. Totally, there are more than 200 fields in the bases that give information regarding phonetic notation, semantics, sources, word building, syntax and pragmatics. Without doubt, this lexicon is one of the largest domestic lexicons available with the most detailed descriptions of newborn Chinese words.

參考文獻


亢世勇(2000)。資訊網路時代中日韓語文現代化國際研討會論文集
亢世勇(2001)。《現代漢語新詞語資訊(電子)詞典》的開發應用。辭書研究。2001(2),55-63。
亢世勇(2001)。中國辭書論集2000
王鐵昆(1992)。新詞語的判定標準與新詞新語詞典編纂的原則。語言文字應用。1994(4),14-20。
於根元(2001)。網路語言概說

被引用紀錄


黃挺豪(2009)。應用於中文意見分析之詞內暨詞間語法結構自動擷取研究〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2009.00083

延伸閱讀