透過您的圖書館登入
IP:3.141.24.134
  • 學位論文

故宮博物院古文物之中文關鍵字檢索系統之研究

A Study on Keyword Retrieval System of Chinese Antiques in the National Palace Museum

指導教授 : 鄒慶士 賴鼎陞

摘要


現今臺灣故宮博物院保留過去所遺留下來的古文物,提供世人學習及欣賞。 由於科技的進步,許多人會透過故宮的器物典藏檢索系統搜尋古文物,在使用 的過程中,會發現通過品名搜尋,系統將直接從資料庫找出完全符合品名的古 文物關鍵字,如果無法完全符合品名格式進行搜尋,則未能將此文物透過系統 找出。 本研究通過器物典藏系統的資料進行研究分析,古文物種類包含銅器、玉 器與瓷器,運用三種方法來改善目前的檢索系統。首先由典藏檢索系統資料將 進行字庫建立,利用索引方法將古文物的關鍵字取出後,透過相似度計算的值 進行結果排序,根據其值可了解關鍵字與文物之間的關係;使用類神經網路找 尋下一個將出現的關鍵字,並於檢索系統介面中顯示預測排名前幾名的詞供使 用者選擇。最後,以潛在狄力克雷分配群集找出每個主題下所產生的關鍵字, 並在檢索系統的搜尋結果中,建議該器形功能的前幾筆關鍵字給使用者查看。

並列摘要


The National Palace Museum has retained the ancient legacy that can be studied and appreciated by the general public. As the result of technology development, everyone is using the search engine system of the National Palace Museum to look for the antiques. In the process of searching, the system can only directly look for the name of the antiques base on the exact keywords of the antiques in the established database. If the search keywords are not able to be matched exactly with the name in the database, the search will not be successful. This study examines and analyzes the National Palace Museum antiques system. In this system, the antiques category was divided into bronze, jade and porcelain. There are three ways to improve the current retrieval system. First, it is to extract keyword by the index from the established dictionary. Then, through the value of cosine similarity, we found the relationship between the keywords and the antiques, and sorted the value in search system. This function not only allows us to use the neural network to predict the next keywords; at the same time, the search system interface can display the top few words of prediction to give the user to choose. Lastly, using Latent Dirichlet Allocation to get the keywords for each topic and present the results at the search system, it shows the recommended top few key words of the device function to the user.

參考文獻


黃居仁,陳克偉,張莉萍,許蔥麗.(1995).中央研究院平衡語料庫簡介.Proceeding of ROCLLING, 7, 85-93.
許薰尹,曾憲雄.(2005).宋詞斷詞與本體論之建置(Doctoral dissertation), 16-45. 林筱晴.(2004).語料庫統計值與網際網路統計值在自然語言處理上之應用:以中文斷詞為例.臺灣大
學資訊工程學研究所學位論文, 6-12. 黃純敏,李亞哲,陳柏宏.(2015).以維基百科為基礎之中文縮寫詞與同義詞庫建構.資訊管理學報,
22(2), 125-132. 熊回香,夏立新.(2008).自然语言处理技术在中文全文检索中的应用.情报理论与实践, 31(3), 432-435. Ben, S (2017) “Word2Vec introduction” https://github.com/bmschmidt/wordVectors
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning

延伸閱讀