透過您的圖書館登入
IP:18.188.119.219
  • 學位論文

應用深度遷移式學習建構以語意相似度為基之商標圖像檢索系統

Develop a Semantic-based Trademark Logo Image Retrieval System Using Transfer Deep Learning Approach

指導教授 : 張瑞芬 張力元
本文將於2024/07/11開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


影像檢索隨著深度學習技術成長而有突破性的發展,以更深層的神經網路提取圖像特徵與演算法的改善,使得網路模型具有人類視覺的思維來辨識更多元的圖像特徵,而非僅限於傳統的低階特徵來匹配圖像。近幾年影像檢索技術的專利及文獻的發展,許多研究致力於減少機器檢索的資訊與人類視覺語意之間的差異,但是對於圖片的巨量化與複雜化其檢索技術往往越有挑戰性。隨著更深層的神經網路對於萃取的語意特徵越抽象化,本研究以遷移式深度學習方法建構商標相似性檢索系統,此系統命名為LogoSimNet。在數位轉型的發展時代中,巨量的商標影像資訊遍佈全球,其中存在一些法律上的議題。商標局以維也納分類法則審查商標圖像的重複性或相似性問題,透過標註檢索代碼進行相似度分析與匹配檢索。隨著商標數量的日趨成長,進一步提升商標局審核員在審查時的困難度與較長的作業時間。在另一方面,由於網路資訊的普及性,使用者可以便捷的透過網路影像臨摹他人圖畫設計,若不謹慎使用,皆可能引發著作權與商標侵權的爭議。這些具有爭議的問題突出了建構自動化智能商標圖像檢索系統的重要性。由於商標圖案在圖像設計上存在多元的視覺語意,如何將人為圖像匹配相似性工作應用到電腦視覺檢索圖像成為關鍵挑戰。除此之外,本研究以技術功效矩陣分析近十年的專利在影像檢索領域的專利布局分析。在基於已公開發行的專利中大部分為傳統機器學習方法,在深度學習方法中相較於傳統方法仍為少數,但深度學習仍為影像檢索中的重要領域。由上述提及的自動化圖像檢索需求以及深度學習的專利發展趨勢,本研究開發了一種以三胞胎神經網路(Triplet network)基於深度學習方法多元相似訓練神經網路。在模型訓練資料集重新整理部分的Logo-2K+商標資料集,以訓練集(超過26,000張圖像)和測試集(超過9000張圖像)來進行預訓練ResNet50V2模型的微調和驗證。商標檢索結果能依據不同視覺語意進行相似度分析,模型驗證準確度達Recall@16達到95%。

並列摘要


Image retrieval (IR) technology has made breakthrough development in recent years due to the growth of deep learning technology. Through the feature extracted from deep neural network model, the machine can learn more semantic visual features, not just traditional low-level features. In recent years, with the review of IR's patented technology and non-patent literature, many studies tend to focus on reducing the visual semantics between machine learning results and human visual understanding. With the development of deep learning methods that can extract more semantic features, this research uses the transfer deep learning approach to construct the trademark (TM) retrieval system named LogoSimNet. In the era of digital transformation, huge number of logos have been widely spread with some possible issues. The TM office reviews duplicate or similar TM patterns by Vienna Classification Rules, in which human-labeled Vienna codes are used for similarity analysis and image retrieval. The increasing trend can increase examination difficulties for TM Office during the initial stage of TM reviews and registrations. And, the fact that users can easily download images through the internet and imitate the TM graphic designs also prone to copyright infringement. These controversial issues highlight the importance of developing automatic and intelligent logo IR methodology. Considering the complexity of TM visual semantics, how to implement the manual similarity examination in computer vision retrieval becomes a key challenge. Furthermore, this research analyzes the patent trend in the field of IR with a Technology Function Matrix. Since most of the published patents are traditional machine learning methods, deep learning methods are still an important field of IR. As mentioned above, this research develops the method of logo image similarity analysis using triplet network architecture. This research uses the logo image training set (more than 26 thousand images) and testing set (more than 9 thousand images) from Logo-2K+ database for ResNet50V2 model fine-tuning and verification. The excepted results show that the LogoSimNet model can be retrieved with multiple visual semantics. Model verification have shown excellent results with Recall@16 exceeding 95%.

參考文獻


[1] Abdesselam, A. (2009). Texture image retrieval using Fourier transform. Paper presented at the Proc. Int. Conf. Commun., Comput. Power (ICCCP’09).
[2] Alzu’bi, A., Amira, A., & Ramzan, N. (2015). Semantic content-based image retrieval: A comprehensive study. Journal of Visual Communication and Image Representation, 32, 20-54.
[3] Appalaraju, S., & Chaoji, V. (2017). Image similarity using deep CNN and curriculum learning. arXiv preprint arXiv:1709.08761.
[4] Ashley, J., Flickner, M., Hafner, J., Lee, D., Niblack, W., & Petkovic, D. (1995). The query by image content (QBIC) system. Paper presented at the Proceedings of the 1995 ACM SIGMOD International conference on Management of data.
[5] Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. Paper presented at the European conference on computer vision.

延伸閱讀