類書是中國歷史上重要的工具類型之書,其將古籍中各個不同的知識敘述片段擷取出來,並依照類書本身的分類方式及編排架構編纂而成,以類相從,以達到整理經籍以及方便查閱的作用。自三國時代開始,類書在中國的發展已近兩千年,收錄典籍愈多,分類方法愈詳細。目前現存以清代康熙、雍正時期所編纂的《古今圖書集成》最為重要,其資料也最為豐富,於現代也仍舊是值得參考的工具書。 《古今圖書集成》內含有約一億七千萬餘的文字量,並且收錄自上古至清初約一萬餘本的古籍資料,又其收錄的知識類型包羅萬象、應有盡有。如此鉅作要能方便地瀏覽查找其內含的豐富知識實屬不易,因此在本研究嘗試以資訊方法來解決這些問題。 本研究主要分為三個部份,第一部份主要說明《古今圖書集成》的成書架構,並依照其架構設計一套處理的流程將其所收錄的知識敘述文句段落切開為獨立條目,並套入台灣歷史數位圖書館(Taiwan History Digital Library, THDL)模型以供使用者方便查閱。第二部份主要針對各條目的古籍出處作整理,利用資訊方法將錯誤或是缺失的出處資訊補正,以達到整理經籍,甚至輯佚的目的。第三部份則是根據前兩部份的資料架構建置及出處整理結果,作交叉性的統計數據。 希望本研究也能夠對於未來類書或是《古今圖書集成》的研究者,達到前導及縮短研究時間之目的。
Leishu(類書, categorically data-assembling book) is a type of reference books developed in ancient China. A leishu first develops a classification structure for the intended knowledge domain, then extracts segments from existing books and fits them into the proper categories so that they can be retrieved and used conveniently later. Gujin Tushu Jicheng(古今圖書集成, Completed Collection of Graphs and Writings of Ancient and Modern Times), published in the 18th century during the Qing Dynasty, is the largest and most valuable leishu. Gujin Tushu Jicheng contains approximate 170 million words, which were taken from over 10 thousand ancient classics and books. In this thesis, we develop information technologies to effectively harness this great book. There are mainly three parts in this thesis. In the first part, we introduce the background and overall structure of Gujin Tushu Jicheng. We also design an automated procedure to identify and analyze the entries in the book. We then build a retrieval system by incorporating the restructured content into the THDL(Taiwan History Digital Library) shell. In the second part, we try to identify the sources of the entries automatically and systematically, fix the errors and patch the omissions. In the last part, we give some statistical data drawn from the analysis done in the first two parts of the thesis.