Title

影像檢索與分類的視覺字典的研究

Translated Titles

The Study of Image Retrieval and Categorization using Visual Vocabulary

Authors

郭章明

Key Words

視覺字 ; 巨觀 ; 微觀 ; 檢索 ; 分類 ; Visual words ; Macro-sense ; Micro-sense ; Retrieval ; Categorization

PublicationName

義守大學資訊工程學系學位論文

Volume or Term/Year and Month of Publication

2017年

Academic Degree Category

博士

Advisor

郭忠民

Content Language

英文

Chinese Abstract

視覺詞彙的方法已經成功應用於許多的多媒體和視覺應用,包括視覺辨識、影像檢索,場景模組建立/分類。這個想法背後所代表的意義是影像可以經由局部特徵的集合為視覺字,藉由各種視覺字更進一步組成語意物件,讓低階特徵提升至高階語意,以改善意涵鴻溝的問題。在這篇論文中,視覺字擁有色彩、結構與紋理特性,將嘗試三種方法建立以視覺字為主的視覺詞彙:(1)以特徵點為基礎的方法建立視覺詞彙,採用尺度不辨特徵轉換(SIFT)擷取特徵點,並且增加色彩資訊,改善傳統SIFT無色彩資訊,讓視覺字的資訊更加豐富與完整。(2)以區塊為主的方法建立視覺詞彙,將以區塊分割的方式來取得影像內容特徵。(3)結合特徵點與區塊為基礎的方法建立視覺詞彙。 本研究考慮到視覺詞彙的同質性,提出一個新穎的視覺字描述效能的研究,引入了巨觀與微觀的想法,將其導入視覺詞彙內,建立影像描述子,進一步的描述影像內容,並且應用在影像檢索。 在影像分類上,依照巨觀與微觀視覺詞彙的基礎,以及改善特徵模組中巨觀與微觀的權重值,根據視覺字組成的各種語意物件,建立影像分類模型,讓每種分類模型擁有獨特性與唯一性,依照機率分類器分類最佳類別,實驗結果證明巨觀與微觀視覺詞彙在影像檢索與分類皆有良好的結果 。

English Abstract

Visual vocabulary representation approach has been successfully applied to many multimedia and vision applications, including visual recognition, image retrieval, and scene modeling/categorization. The idea behind the visual vocabulary representation is that an image can be represented by visual words, a collection of local features of images. In my dissertation, I will develop a new scheme for the construction of visual vocabulary based on the analysis of visual word contents. By considering the content homogeneity of visual words, the developed visual vocabulary contains macro-sense and micro-sense visual words. The two types of visual words are appropriately further combined to describe an image effectively. For micro-sense visual words, we try to investigate the effective from various viewpoints. Firstly, the SIFT is selected for feature points extraction, and then the color features is designed to build new SIFT-based feature descriptor for improving the conventional methods. Secondly, we also consider the block-based visual words as micro-sense descriptor and compares their advantages. Thirdly, discuss the advantages of macro- and micro-sense visual words based on point or block visual word. In this work, we will try to construct a new visual vocabulary for the applications of image retrieval and categorization, considering the characteristics of visual words. By taking the inhomogeneous and incomplete content of visual words into account, we design a new visual vocabulary that can describe different semantics in images more effectively. The performance evaluation for the two applications indicates that the proposed visual vocabulary systems achieves promising results.

Topic Category 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學系
Reference
  1. [3] Y. Deng, B.S. Manjunath, C. Kenney, M.S. Moore, H. Shin, “An efficient color representation for image retrieval.” Proc IEEE Transactions on Image Processing Issue, Vol.10, pp.140-147.2001.
    連結:
  2. [5] B.S. Manjunath, J.R. Ohm, V. V. Vasudevan and A. Yamada, “Color and texture descriptors,”proc IEEE Transactions on Circuits and Systems for Video Technology Volume 11 Issue 6, pp.703-715, June 2001.
    連結:
  3. [6] A. Mojsilovic, J. Hu and E.Soljanin, “Extraction of perceptually important colors and similarity measurement for image matching, retrieval, and analysis.” Proc IEEE Transactions on Image Processing Volume.11, Issue.11, pp.1238-1248, December 2002.
    連結:
  4. [7] A. Mojsilović, J. Kovacević, J. Hu, R,J. Safranek and S.K. Ganapathy, “Matching and retrieval based on the vocabulary and grammar of color patterns.” Proc IEEE Transactions on Image Processing, Volume 9 Issue 1, pp38-54, January 2000.
    連結:
  5. [11] S. Xu, T. Fang, D. Li and S. Wang, “Object classification of aerial images with bag-of-visual words.” Proc IEEE Geoscience and Remote Sensing Letters Volume 7,Issue 2, pp.366-370, April 2010.
    連結:
  6. [12] S. Zhang, Q. Tian, G. Hua, Q. Huang and W. Gao. “Generating descriptive visual words and visual phrases for large-scale image applications.” Proc IEEE Transactions on Image Processing Volume 20, Issue 9, pp.2664-2677, 2011.
    連結:
  7. [13] K. Kesorm and S. Poslad, “An enhanced bag-of-visual word vector space model to represent visual content in athletics images.” Proc IEEE Transactions on Multimedia Volume 14, Issue 1, pp.211-222, 2012.
    連結:
  8. [14] H. Liu and C.M Zhang. “Codebook design of keyblock based image retrieval.” Proc Entertainment Computing – ICEC 2007 pp.470-474, 2007.
    連結:
  9. [17] L. Zhu, C. Tang, A. Rao and A. Zhang. “Using thesaurus to model keyblock-based image retrieval.” Proc Multimedia and Expo, 2001. ICME 2001. IEEE International Conference, 2001.
    連結:
  10. [19] A. Bosch, A. Zisserman and X. Muñoz. “Scene classification using a hybrid generative/discriminative approach.” Proc IEEE Transactions on Pattern Analysis and Machine Intelligenc, Volume 30, Issue 4, pp.712-727, April 2008.
    連結:
  11. [20] R. Ji, H. Yao, W. Liu, X. Sun and Q. Tian. “Task-dependent visual-codebook compression.” Proc IEEE Transactions on Image Processing Volume 21 Issue 4, pp.2282-2293, April 2012.
    連結:
  12. [21] Y. G. Jiang, J. Yang, C. W. Ngo and A. G. Hauptmann, “Representations of keypoint-based semantic concept detection: a comprehensive study.” Proc IEEE Transactions on Multimedia Volume 12, Issue 1, pp.42-53, 2010.
    連結:
  13. [22] R. J. López-Sastre, T. Tuytelaars, F.J. Acevedo-Rodriguez and S. Maldonado- Bascón. “Towards a more discriminative and semantic visual vocabulary.” Proc Computer Vision and Image Understanding Volume 115, Issue 3, pp.415-425, March 2011.
    連結:
  14. [23] F. Perronnin, “Universal and adapted vocabularies for generic visual categorization.” Proc IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 30 Issue 7. pp.1243-1256, July 2008.
    連結:
  15. [24] J. Qin and Nelson H.C. Yang, “Scene categorization via contextual visual words.” Proc Pattern Recognition Volume 43, Issue 5, pp1874-1888, May 2010.
    連結:
  16. [25] R. Ren and J. Collomosse, “Visual sentences for pose retrieval over low-resolution cross-media dance collections.” Proc IEEE Transactions on Multimedia Volume 14, Issue 6, pp.1652-1661, 2012.
    連結:
  17. [26] A. Rocha, T. Carvalho, H.F. Jelinek, S. Goldenstein and J. Wainer “Points of interest and visual dictionaries for automatic retinal lesion detection.” Proc IEEE Transactions on Biomedical Engineering Volume 59, Issue 8, pp.2244-2253, May 2012.
    連結:
  18. [28] W. Zhou, H. Li, Y. Lu and Q. Tian. “Principal visual word discovery for automatic license plate detection.” Proc IEEE Transactions on Image Processing 21(9) pp.4269-4279, May 2012.
    連結:
  19. [29] A. Bolovinou, I. Pratikakis and S. Perantonis. “Bag of spatio-visual words for context inference in scene classification.” Proc Pattern Recognition Volume 46, Issue 3. pp.1039-1053, March 2013.
    連結:
  20. [31] Y. Cao, F. Sun, D. Wang and J. Zhou. “Image cluster and retrieval with latent Dirichlet allocation model.” Proc . Int J Digit Content Technol Appl 6(18) ,pp.89–98. 2005.
    連結:
  21. [33] F. F. Li and P. Perona. “A bayesian hierarchical model for learning natural scene categories.” Proc CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) Volume 2 Volume 02, pp.524-531, 2005
    連結:
  22. [34] E.B. Sudderth, A. Torralba, W.T. Freeman and A.S. Willsky, “Describing visual scenes using transformed dirichlet processes.” Proc International Journal of Computer Vision Volume 77, Issue 1, pp.291–330, May 2008.
    連結:
  23. [36] C. Fredembach, M. Schröder and S. Süsstrunk, “Eigenregions for image classification.” Proc IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 26, Issue 12, pp.1645-1649, 2004.
    連結:
  24. [37] M. Ward, G. Grinstein and D. Keim, Interactive data visualization: foundations, techniques, and application chapter 3. Hum Percept Inf Proc 73–128, A K Peters/CRC Press.
    連結:
  25. [39] C. Ancuti and P. Bekaert. “SIFT-CCH: Increasing the SIFT distinctness by Color Co-occurrence Histograms.” Proc Image and Signal Processing and Analysis, 2007. ISPA 2007. 5th International Symposium, 2007.
    連結:
  26. [40] D.G. Lowe. “Distinctive image features from scale-invariant keypoints.” Proc International Journal of Computer Vision Volume 60 Issue 2, pp.91-110. Nov 2004
    連結:
  27. [41] Y.S. Sie,”Image Categorization based on Bag of Visual Words.” 2014.
    連結:
  28. [1] L. D. Baker, A. K. McCallum,” Distributional clustering of words for text classification,” Proc. Assoc. Comput. Machinery Special Interest Group Informat. Retrieval (SIGIR), pp. 96–103, 1998.
  29. [2] R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter, “On feature distributional clustering for text categorization,” Proc. Assoc. Comput. Machinery Special Interest Group Informat. Retrieval (SIGIR), pp. 146–153, 2001.
  30. [4] W. Y. Ma, Y. Deng, and B. S. Manjunath, “Tools for texture/color based search of images.” Proc SPIE Int. Conf. 3106, Human Vision and Electronic Imaging II, pp.496-507, 1997.
  31. [8] A. Yamada, M. Pickering, S. jeannin,. and Jens, L.C. ”MPEG-7 Visual Part of Experimentation Model Version 9.0-Part3 Dominant Color”, proc ISO/IEC JTC1/SC29/WG11/N3914, Pisa, Jan 2001
  32. [9] N. C. Yang, W .H. Chang, C. M. Kuo and T. H. Li. “A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval.” Proc Journal of Visual Communication and Image representation Volume 19, Issue 2, pp.92–105, February 2008.
  33. [10] D.M. Blei, A.Y. Ng, M.I. Jordan, “Latent Dirichlet Allocation.”Proc Journal of Machine Learning Research 3, pp.993-1022, 2003
  34. [15] W. Song and C. Cai. “Wood image retrieval algorithm based on keyblock distribution.” Proc Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference, December, 2009.
  35. [16] L. Zhu, A. B. Rao and A. Zhang, “Theory of keyblock-based image retrieval.” Proc ACM Transactions on Information Systems (TOIS) Volume 20 Issue 2, pp224-257. April 2002.
  36. [18] L. Zhu, A. Zhang, A. Rao and R. Srihari. “Keyblock: an approach for content-based image retrieval.” Proc MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia, pp.157-166, 2000.
  37. [27] L. Wu, Steven C.H. Hoi and N. Yu. “Semantics-preserving bag-of-words models and applications.” Proc LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining. pp.19-26, 2009.
  38. [30] L. Thibos. “Image processing by the human eye.” proc SPIE 1199, Visual Communications and Image Processing IV, November 1989.
  39. [32] E. Hörster, R. Lienhart and M. Slaney. “Image retrieval on large-scale image databases.” Proc CIVR '07 Proceedings of the 6th ACM international conference on Image and video retrieval, pp.17-24, 2007
  40. [35] C. Wang, D. Blei and F.F. Li. “Simultaneous image classification and annotation.” proc IEEE Comput Vis Pattern Recog (CVPR), pp.1903-1910, 2009.
  41. [38] C. Kuo, N. C. Yang, C. M. Kuo and L. K. Huang, “Image retrieval using point- and block-based visual vocabulary.” Proc Next-Generation Electronics (ISNE), 2015 International Symposium, 2015.