透過您的圖書館登入
IP:18.226.187.24
  • 期刊

臺灣歷史人物文本檢索與探勘系統之建置

Development of a Text Retrieval and Mining System for Taiwanese Historical People

摘要


「人物」是歷史學研究重要的實體類型之一,因此,對人物傳記的深入了解有助於歷史事件的相關研究。目前許多人物傳記資料是以數位文件的形式存在,而要以人力從大量人物傳記中爬梳、彙整資料頗為曠日廢時,宜妥為運用資訊科技協助歷史學家。此外,儘管臺灣過去已建置眾多資料庫,也有各種人物傳和可資應用的資料文獻,卻較少進行歷史人物資料庫勘考、分析工具的開發。有鑑於此,研究者乃組成研究團隊,以《新修彰化縣志‧人物志》為文本來源,發展資料庫檢索、全文檢索、文本探勘與社會網絡等分析工具,協助歷史人文學進行研究,長期目標為建置「臺灣歷史人物資料庫(Taiwan Biographical Database, TBDB)」。本研究主旨在於描述「臺灣歷史人物資料庫」現階段所收錄之人物特性,闡述系統架構,以及說明初步成果。此外,本研究將提出一套演算法辨識《新修彰化縣志‧人物志》中的命名實體(named entity),並以詩社名稱辨識為例說明。該套演算法的召回率達96%,精確率則為65%。最後,本研究將說明建置「臺灣歷史人物資料庫」過程中習得之經驗和未來發展方向。

並列摘要


Personage is an important kind of entities in the study of history. Comprehensive understanding of personage biographies is beneficial for researching into historical events. In the digital era, many personage biographies are available in digital formats; as a result, it is time-consuming and labor-intensive for researchers to explore invaluable findings from massive personage biographies. Facing this situation, researchers may be helped to utilize the information efficiently with information technologies. This article introduces the development of a text retrieval and mining system for Taiwanese historical people -- Taiwan Biographical Database (TBDB). It describes the characteristics of personages in TBDB, highlights the system architecture and preliminary achievement of TBDB, and proposes a method to recognize named entities in the personage biographies, specifically poetry societies, which achieves the recall rate of 96% and the precision rate of 65%. Finally, this article elaborates on the lessons learned through the creation of TBDB, and the future plans.

參考文獻


Sie, S. H., Ke, H. R., & Chang, S. B. (2017). Development of a text retrieval and mining system for Taiwanese historical people. In F. Lin, S. Chen, D. Wang & L. Chen (Ed.), Proceedings of the 2017 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (pp. 56-62). doi:10.23919/PNC.2017.8203522
李宗翰、柯皓仁、張素玢、李毓嵐(2017 年1 月)。從CBDB 到TBDB:以《新修彰化縣志.人物志》為試金石。在項潔、陳樹衡主持,第八屆數位典藏與數位人文國際研討會(DADH 2017)。國立政治大學數位人文團隊主辦,臺北市,中華民國。
張尚斌(2006)。詞夾子演算法在專有名詞辨識上的應用─以歷史文件為例(未出版之碩士論文),國立臺灣大學資訊工程學研究所,臺北市。
Bol, P. K., Hsiang, J., & Fong, G. (2012). Prosopographical databases, text-mining, GIS and system interoperability for Chinese history and literature. In J. C. Meister (Ed.), Digital Humanities 2012, Conference Abstracts (pp. 43-51). Hamburg: Hamburg University Press.
Brookshear, J. G. & Brylow, D. (2015). Computer science: An overview (12th edition). Boston, N.J.: Pearson Education.

延伸閱讀