  • 學位論文


Applying Cloud Computing to Large-scale Trademark Image Retrieval System

指導教授 : 劉震昌


商標局的商標影像每天不斷有新的商標影像被註冊,經過時間累積商標局的資料庫影像不斷增加,使用商標局所提供檢索功能,很難在龐大的資料庫快速搜尋一張商標影像。本論文研究「以圖找圖」的功能進行查詢一張商標影像,並可以提供搜尋 (Search)、監看(Watch)、真仿識別 (Illegal detection) 等三種應用。 本論文使用商標局的影像為來源,資料庫總共有 1,034,092 張影像。 在商標檢索方面,很難以文字描述一張商標影像,所以我們使用 Sale-Invariant Features Transform (SIFT) 為影像特徵描述,並使用字彙樹的方法階層式量化SIFT特徵,利用樹狀架構可以加快比對的速度。由於資料庫有 1,034,092 張影像和 240,558,829 個 SIFT 特徵,使用單台桌上型電腦無法訓練大量影像特徵進行分群,所以我們架設了Hadoop平台訓練一棵字彙樹 (Vocabulary Tree)。最後將影像字(Visual Words) 使用反向索引 (Inverted Index) 技術,統計影像字 TF-IDF 權重分數,並且用不同距離公式計算影像相似度。 本論文的實驗,在1,034,092 張的影像資料庫,使用 200 張查詢影像進行測試,統計TOP1回傳影像並正確答對的準確率達到 78%。本論文最主要的貢獻提供一套大規模商標影像檢索系統,可以讓使用者使用網頁介面快速查詢一張商標影像。


Many trademark images are licensed every day, thus the trademark database in the Trademark Office grows gradually. However, it is difficult to search a trademark in the large-scale images database by using the basic search utilities provided by the Trademark Office. In this thesis, a content-based image retrieval system is developed to search trademark images. The system can provide further applications including trademark search, watch, and illegal detection. In this thesis, there are totally 1,034,092 trademark images crawled from the Trademark Office. Because it is difficult to describe a trademark image using keywords in a trademark image retrieval system, we apply the Sale-Invariant Features Transform (SIFT) to extract image features. These SIFT features are hierarchical clustered by applying the method of Vocabulary Tree, which increases the speed of image similarity search. The resultant number of extracted SIFT features is 240,558,829. Because it is unable to cluster this large amount of images features in a single desktop computer, we setup a Hadoop platform to train the vocabulary tree. The quantized SIFT features, called visual words, were inverted indexed and their TF-IDF weights were calculated. Different distance measures were experimented to calculate the image similarity. In our experiments, 200 images are used as queries to search the database containing 1,034,092 trademark images. The Top 1 precision achieves 78%. The contribution of this thesis is to provide a practical large-scale trademark image retrieval system, which allows users to use web interface to query a trademark image efficiently.


[1] 山寨星法克Star Fucks ,http://blog.yam.com/aa305020/article/30735286
[2] 星巴克商標發展史,http://www.wretch.cc/blog/BillBillBill/13131595
[3] kuso版星巴克商標,http://www.nownews.com/2011/01/08/327-2680182.htm
[4] J. Schietse, J. P. Eakins, and R. C. Veltkamp, “Practice and Challenges in Trademark Image Retrieval,” Proceedings of International Conference on Image and Video Retrieval, Pages 518-524, July 2007.
[5] 經濟部智慧財產局商標檢索系統, http://tmsearch.tipo.gov.tw/TIPO_DR/index.jsp
