透過您的圖書館登入
IP:18.226.96.61
  • 期刊
  • OpenAccess

話題建模在中國古代典籍分析中的運用

An Application of Topic Modeling in the Analysis of Ancient Chinese Classical Works

摘要


話題建模,是計算機進行大規模語料分析的一個重要方法,可以發現海量文本中隱含的話題。作為自然語言處理領域的重要研究工具,話題建模已被越來越多地應用於現代漢語的文本分析中,但是對於古代漢語或者說文言文的文本分析鮮有涉及。本文以《論語》、《孟子》、《荀子》三部先秦儒家經典文本為研究對象,通過話題建模的方法,來分析、比較、討論這三部著作在主題思想上的分布與變化;旨在探討「機器閱讀」在中國古代典籍研究中的應用前景。

並列摘要


Topic modeling is a digital method to discover hidden thematic structure in large collections of unlabeled texts. It is now widely used to analyze massive modern Chinese texts from internet pages, new media and social net, like document classification and clustering, hot event detection and tracking, opinion mining and so on. This paper uses topic modeling in the analysis of the top-three classical works of pre-Qin Confucianism to discuss their ideological inheritance and development. The aim is to cast a new light on "close and direct reading" of classical Chinese texts through "distant and machine reading" and encourage some more creative usage of digital methods in the research of Chinese classical works.

並列關鍵字

machine reading text analysis topic modeling MALLET

參考文獻


Landauer, T., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25, 259-284. doi:10.1080/01638539809545028
漢.班固(2006)。漢書。中國哲學書電子化計劃。取自 https://ctext.org/han-shu/yi-wen-zhi/zh#n64732
Sturgeon, D. 編(2006)。中國哲學書電子化計劃。取自 http://ctext.org/zhs
胡適(2006)。中國中古思想史長編。合肥:安徽教育出版社。
Easley, D., & Kleinberg, J. (2010). Networsk, crowds, and markets: Reasoning about a highly connected word. Cambridge, UK: Cambridge University Press.

延伸閱讀