Identifying Food-related Word Association and Topic Model Processing using LDA

This paper presents an interdisciplinary study that combines natural language processing and psycholinguistics research. The latent Dirichlet allocation (LDA) model was used for semantic relatedness computation to enable an understanding of the mechanisms and processes through which humans encode and retrieve lexical units. To test the similarity of the output of the topic model and human word association, the ＂Time-limited Multiple Divergent Thinking Test of Word Associative Strategy＂ (TLM-DTTWAS) was used to collect data and conduct tests with three food-related stimulus words. A total of 101 subjects took the tests, producing 4,251 words. The empirical results were analyzed on two levels: (1) by the expert word association classification: taxonomic and script proposed by Ross and Murphy (1999); (2) followed by the associative hierarchy theory of Mednick (1962), to sort the vocabulary test results into two associative hierarchies, ＂steep＂ and ＂flat.＂ The analysis indicated that human word association displays randomness, as well as generalization and continuity. After the experimental text was passed through the LDA latent semantic model which demonstrated highly significant correlation. This was a whole new attempt to train a data science model to make inference and prediction of human concept association which could be very useful in teaching as well as commercial applications.

關鍵字

LDA (latent Dirichlet allocation) ； Mandarin Vocabulary Study ； Semantic Priming ； Timelimited Multiple Divergent Thinking Test of Word Associative Strategy (TLM-DTTWAS) ； Word Association

並列摘要

本研究結合自然語言處理及心理語言學二者，屬一跨領域研究。為理解人類對詞彙認知與習得的機制與過程，試圖以主題模型中的潛在語意模型LDA（latent Dirichlet allocation)，進行詞彙語意相關度的運算。為測試潛在語意模型的輸出與人類詞彙聯想的相似度，本研究藉由大規模的多重限時「詞彙聯想策略擴散性思考測驗」的資料搜集，以三項刺激詞進行測驗，共101位受試者參與受試，輸出共4,251項獨立詞。實驗結果透過二個層次的分析：(1)以專家分類（expert classification）的方式，透過二名專家，一方面以Ross與Murphy（1999）所提出的詞彙聯想結果的分類指標（知識及腳本分類）分類。另一方面，以Mednick（1962）的連結層級理論，將詞彙測驗結果分為二類：陡峭式與平緩式連結。分析結果指出人類聯想不僅具有隨機性，更具有普遍性及延展性。(2)實驗文本經由潛在語意模型LDA運算，二者的結果交叉比對後，證實具高度顯著相關。輸出結果符合人類學習和聯想的機制。本研究所進行的是一個全新的嘗試—資料處理科學對人類的詞彙及概念的聯想進行推理和預測。此一結果，未來在教學和商業上可提供改善及應用。

並列關鍵字

LDA（latent Dirichlet allocation）；華語詞彙學習；語義啟動；多重限時「詞彙聯想策略擴散性思考測驗」；詞彙聯想

參考文獻

陳明蕾、王學誠、柯華葳（2009）。中文語意空間建置及心理效度驗證：以潛在語意分析技術爲基礎。中華心理學刊，51(4)，415-435。

Altınel, B., & Ganiz, M. C. (2016). A new hybrid semi-supervised algorithm for text classification with class-based semantics. Knowledge-Based Systems, 108, 50-64. doi: 10.1016/j.knosys.2016.06.021

Baddeley, A. D. (1982). Domains of recollection. Psychological Review, 89(6), 708-729.

Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13-47.

De Boom, C., Van Canneyt, S., Demeester, T., & Dhoedt, B. (2016). Representation learning for very short texts using weighted word embedding aggregation. Pattern Recognition Letters, 80, 150-156. doi: 10.1016/j.patrec.2016.06.012

國際替代計量

Identifying Food-related Word Association and Topic Model Processing using LDA

全文下載

主題瀏覽