利用使用者回饋尋找相關條目-以《清實錄》中臺灣相關資料為例

《清實錄》是一部巨大的歷史典籍，為編年體的形式。按年月日紀載了清朝三百餘年的皇帝每日的活動與事蹟，其中包含了某些重要官員的任命與上奏紀錄、皇帝發布的政令、人口資料、貨物運送、四處征戰等的重要資料，加上是由官方記載、結構嚴謹，因此對研究清史的歷史學者是一部珍貴的史料。但在《清實錄》這樣大型的歷史典籍中，文史學者要探究的議題可能只牽涉到其中少部分條目，如何將這些相關條目抽取出來，是一個重要的問題。傳統上，歷史典籍經過數位化之後，使用者會利用關鍵字搜尋的方式找尋相關條目，但這樣的方式，相關條目若未含有這些關鍵字，就無法利用這樣的方式找出。本論文主要目的為提出一相關條目的搜尋方法，計算條目內容彼此之間的相關度，去取代關鍵字的搜尋方式。利用使用者從文本內選定少量條目，算出其餘每篇條目與選定條目的相關度，使用者由相關度大到小瀏覽，收集更多相關條目後，再重新計算相關度，在這樣反覆回饋的程序中，找出所有使用者所需的相關條目。在民國八十年代，一群學者以人工的方式從《清實錄》中抓取出他們所認定與臺灣相關的條目，彙編成為《清實錄臺灣史資料專輯》。本論文利用此書與《清實錄》的資料來測試不同相關度演算法在歷史文獻上的效能，再設計一套基於使用者回饋的條目搜尋方法並根據該方法實作清實錄使用者回饋相關條目搜尋系統，最後，利用《清實錄臺灣史資料專輯》的條目，找出《清實錄》內更多與臺灣有所關聯的條目。本論文主要分成兩個部分，第一部分說明如何去對應這兩本歷史典籍數位化資料中相同的條目，接著，介紹不同的條目相關度計算方法，再利用各種效能評估方式，測試這些相關度計算方法在這兩份歷史典籍上的效能。第二部分是基於表現最好的相關度計算方法，設計一使用者回饋相關條目搜尋演算法並實作出一系統，經使用者操作該系統找尋出清實錄內更多臺灣相關的條目，最後，對這些新找出的條目做簡單觀察和統計分析。

關鍵字

清實錄；清實錄臺灣史資料專輯；數位人文；臺灣歷史數位圖書館；臺灣史；條目相關度；資訊檢索

並列摘要

“The Veritable Records of Qing” is a comprehensive historical records. It is a chronologically arranged collection of important issues with the day-to-day routine activities of the emperor and of memorials, including the submission or appointment of significant officials, imperial decrees, demographic information, cargo delivery and expeditions. It is compiled through emperors’ order, and it is also with strict structure. Therefore, it provides a valuable source for historians who conduct research on Qing dynasty. However, when scholars do research in “The Veritable Records of Qing”, to extract a small portion of relevance issue from this huge records can be a problem. Although after these historical records are digitalized, scholars can use keywords search to find relevant historical text. Nevertheless, if these relevant historical texts of interest do not contain the used keywords, it cannot be found by the tool. In this research, a method for finding relevant historical texts is proposed. It will compute the level of relevance between each text, instead of using keyword search. Based on some selected texts of interest by the researcher, the methods will compute the level of relevance between the selected texts and the potential texts of interest. After the computation, the potential texts of interest are listed by its rank. Researchers can choose texts they are interested in and send out their result. Having the feedback texts chosen from researchers, the method will continue on the next iteration, and find out the texts that are even more likely to be of interest of the researchers. In 1990s, scholars retrieved the supposed texts relevant to “Taiwan” from “Veritable Records of Qing” manually, and then edited them into “Veritable Records of Qing-Taiwan Selection”. In the research, this edition and “Veritable Records of Qing” are adopted to examine the performance of different relevance algorithm on general historical records. Next, a system based on relevance feedback algorithm is proposed to provide users or researchers with an interface to search for relevant texts in huge historical records. Finally, the research used “Veritable Records of Qing-Taiwan Selection” as an example to find out more relevance historical texts from “Veritable Records of Qing” that have not been chosen. The research can be divided into two part. The first part will be deliberating on the method proposed to match the two digitalized historical records mentioned above. Besides, different ways for computing relevance level in texts and branch mark of these methods on the performance on these two historical records will be introduced. While in the second part, the relevance feedback system based the most well-performed method in the experiment is introduced. Finally, with some testing by historians, the texts found out through this method are analyzed and observed

並列關鍵字

Veritable records of the Qing dynasty ； digital humanities ； THDL ； Taiwanese history ； text mining ； Information Retrieval

參考文獻

[4] R. Agrawal, T. Imieliński, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” SIGMOD Rec., vol. 22, no. 2, pp. 207–216, Jun. 1993.

[5] 陳嘉翔。〈清代臺灣行政檔案條目自動分類至歷史事件〉。碩士論文，臺灣大學，2010。

[6] Wikipedia, Information retrieval — Wikipedia, The Free Encyclopedia. 2017.

[7] B. J. Jansen and S. Y. Rieh, “The seventeen theoretical constructs of information searching and information retrieval,” Journal of the Association for Information Science and Technology, vol. 61, no. 8, pp. 1517–1534, 2010.

[8] W. B. Frakes and R. Baeza-Yates, “Information retrieval: data structures and algorithms,” 1992.

國際替代計量

利用使用者回饋尋找相關條目-以《清實錄》中臺灣相關資料為例

全文下載

主題瀏覽