透過您的圖書館登入
IP:216.73.216.95
  • 學位論文

臺灣老照片資料庫重複照片比對研究

On Detecting Duplications in a Database of Taiwanese Old Photographs

指導教授 : 項潔

摘要


臺灣舊照片資料庫(URL http://photo.lib.ntu.edu.tw/pic/db/oldphoto.jsp)係臺大圖書館所收藏之豐富的日治時期出版品,其中包含大量臺灣相關書籍及期刊資料,將其中的照片影像做數位化為數位照片成主要內容的照片資料庫。資料庫總計照片與詮釋資料( metadata )共三萬八千餘筆,並提供完善的詮釋資料檢索機制作線上瀏覽,更能就學術合理使用範圍內下載詮釋資料與數位圖像。 但是照片的內容與詮釋資料會因為不同書籍的編輯描述造成不一致性,使得出現了重複照片但不容易以文字檢索能順利找到相同內容的重複照片同時也造成用相同的文字檢索會出現重複照片的冗餘情況。 所以,本研究目的是著眼在除了利用文字描述與詮釋資料的檢索外,還有利用影像內容的檢索(content based image retrieval,CBIR)的方法來應用,利用影像內容的檢索的方式擬定半自動化系統的方法流程為照片內容做相似度比對,蒐集高相似度的相似照片對,再以人工檢視的方式將重複照片對找出來。 最後我們在臺灣舊照片資料庫系統的資料庫中的38,653張照片做為相似照片的比對的實作,我們以預估Recall在有達到90%以上的程度去檢視確認相似的目標照片對共308,286對,然後共找到了3,270對確定為重複照片對,構成2,621組的重複照片組,以便給予系統維護的單位資料庫中的重複照片組集合,對系統內重複照片冗餘問題做進一步的處理。

並列摘要


In 2003, the National Taiwan University Library produced a digital collection of old photographs of Taiwan. They cover the period from 1895 to 1945, when Taiwan was occupied by Japan. The photos, 38,653 in total, were selected from over 2,000 books published by the Japanese Colonial Government during that time, and cover a wide range of subjects. They were made into a digital library, with images and metadata records, and is the most extensive database of its kind in existence. We observed that there are duplications of photos in the database. They were either because certain photos were included in different books, or because some books were scanned twice. The purpose of the research reported in this thesis is to find duplication of images in the database. We adopted methods in content-based image retrieval and developed a system to identify pairs that might have come from the same photo. The pairs were then checked manually to see if they are indeed duplicates. Among the photographs in the database, our system identified 308,286 pairs, of which 3,270 were duplicated photo pairs. Since some photos appeared more than twice (9 being the most), there are 2,621 photo groups altogether. We estimate that the recall rate is over 90%.

參考文獻


[5] Datta, Ritendra; Dhiraj Joshi, Jia Li, James Z. Wang (2008). "Image Retrieval: Ideas, Influences, and Trends of the New Age". ACM Computing Surveys 40: 1-60.
[15] Jun Jie Foo, Justin Zobel, Ranjan Sinha,(2007), “Clustering near-duplicate images in large collections”, ACM Image retrieval and multimedia modeling: 21-30
[16] David G. Lowe, "Distinctive image features from scale-invariant keypoints", International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
[17] David G. Lowe, "Object recognition from local scale-invariant features", International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157
[18] Ki Cheon Yoon, Kyu Ho Park,(1996) Flow-based relaxation method for edge detection, Electronics Letters Publication Date: 4 Jan 1996 Volume: 32, Issue: 1

延伸閱讀