透過您的圖書館登入
IP:54.163.14.144
  • 期刊

深度詞庫:邁向知識導向的人工智慧基礎

DeepLEX: Toward a Knowledge-yielding Approach and Resource for AI

摘要


晚近的深度學習神經網路在大數據與高效計算的時代背景之下,在語音處理與其他辨識任務上取得重大的成就。尤其詞嵌入(word embeddings)的分布向量語意(distributional vector semantics)表徵提出後,計算機逐步掌握人類語言中的詞彙語義關係。然而語言與概念知識中存在的豐富階層關係,仍難以被目前的神經網路架構表徵與概化。在計算語言學領域,學者們從不同的詞彙理論假說,發展出各式詞彙資源(lexical resources),試圖彌補計算機從「共聚性」(syntagmatic)資料難以學習到的「類聚性」(paradigmatic)知識,以讓計算機逐漸靠近人類可以利用少量數據,在未知情況下進行推理,以及瞭解甚至同理人類情感的能力。這些人類能力的共通之處在於涉及個人、社會與文化脈絡的互動,具有高脈絡變異性的特點,難以用巨量的薄數據的方式讓電腦學習。此研究採取計算功能語言學的觀點,認為詞庫是外顯的人類語言知識倉儲。透過人為標記與自動的抽取紀錄,是通用人工智慧自主學習的重要基礎之一。本研究並進一步認為,詞庫中的語言知識除了「形式」與「意義」的配對關係以外,更應回應在中文語言裡,表達形式的流動性以及表達形式與意義的連動性。本研究的目的在整合並發展包含語言、心理、華語教學等各層次變項的「深度詞庫」,以及讓使用者得以自由決定中文語式的標記工具,並討論此語言資源未來的可能應用。

並列摘要


Deep learning and neural network has gained substantial progress in recent years. After the introduction of word embeddings, a form of distributional vector semantics, computers could better simulate the lexical semantic relationships between words. However, the hierarchical nature of human language and concepts are still difficult to modeled by current approach. In computational linguistics, researchers developed lexical resources from different theoretical perspectives. These language resources attempt to bridge the gap between syntagmatic relationships, which computers can readily modeled from data, and paradigmatic knowledge, that are not readily grasped by computers. These knowledge are essential for the capability to reason in an unfamiliar context with only few data, and are also vital to develop empathy of human emotions. The commonality of these capabilities involves the high context variance, in which individual, social and cultural context intertwined, render a great challenge for computers to learn in a data-hungry way. Current study considers, as one would argue in computational functional linguistics, lexicon as an explicit knowledge base of human language. It is human annotation aided by automatic extraction the essential building block of strong artificial intelligence. Moreover, the knowledge stored in lexicon not only contains the pairing between forms and meanings, it should also address the fluidity of formulae and the dynamics between form-meaning pairings. The goal of current study is thus to integrate and develop a novel lexicon model called DeepLex that includes multilevel lexical properties, such as linguistic, psychological and pedagogical. A web-based tool is also developed to help users to freely determine and annotate formulae in Chinese. Further applications of DeepLex is also discussed.

參考文獻


吳小涵(2018):《以性別自然語言處理觀點分析與預測同志語言》(未出版碩士論文)。國立台灣大學語言學研究所,台北。[Wu, H. H. (2018). Investigating and Recognizing Lavender Language in a GenderNLP Perspective (Unpublished Master’s Dissertation). National Taiwan University, Taipei, Taiwan.] doi: 10.6342/NTU201804148
楊靜琛(2015)。《測量華語兒童早期詞彙成長:以語料庫為本之研究》(未出版碩士論文)。國立台灣大學語言學研究所,台北。[Yang, C. C. (2015). Measuring Early Vocabulary Growth in Mandarin-Speaking Children: A Corpus-Based Study (Unpublished Master’s Dissertation). National Taiwan University, Taipei, Taiwan.] doi: 10.6342/NTU.2015.02000
王伯雅(2015):《詞彙穩定的秘密—對各語言學面向的質性與量化分析》(未出版碩士論文)。國立台灣大學語言學研究所,台北。[Wang, P. Y. (2015). Secrets of Lexical Conventionalization: A Quantitative and Qualitative Exploratory Analysis on Linguistic Factors (Unpublished Master’s Dissertation). National Taiwan University, Taipei, Taiwan.] doi: 10.6342/NTU.2015.01992
呂佩瑜(2015):《中文情緒詞庫的建造與標記》(未出版碩士論文)。國立台灣大學語言學研究所,台北。[Lu, P. Y. (2015). Affective Lexicon in Chinese - Construction and Annotation (Unpublished Master’s Dissertation). National Taiwan University, Taipei, Taiwan.] doi: 10.6342/NTU201602978
劉郁文(2017):《憂鬱症線上討論言談之主題分析》(未出版碩士論文)。國立台灣大學語言學研究所,台北。[Liu, Y. W. (2017). Exploring Topics in Online Discourses on Depression (Unpublished Master’s Dissertation). National Taiwan University, Taipei, Taiwan.] doi: 10.6342/NTU201700670

被引用紀錄


蕭惠貞、詹士微、陳瀅伃(2022)。人工智慧學習平台之教學應用反思-以法律華語文本為例臺大華語文學習與科技2(1),107-143。https://doi.org/10.30050%2fCLLT.202206_2(1).0004

延伸閱讀