透過您的圖書館登入
IP:3.144.93.73
  • 學位論文

評估乘積量化用於影像辨識之效能

Evaluation of Product Quantization for Image Recognition

指導教授 : 朱威達
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


使用歐式距離計算兩資料點之間的距離是在資料群聚、資料分類以及資料探勘等應用中非常基本的步驟。但是在許多的多媒體應用中,高維度距離的計算會遭受到效能上的問題以及維度的詛咒。乘積量化(product quantization)是一個有效處理上述問題的量化系統。乘積量化的核心概念是將原本高維度的向量空間分解成數個低維度子空間的笛卡兒乘積,並且分別量化每個子空間。 在本論文中,我們介紹乘積量化以及三個重要的因素:一、切割向量的方法;二、子向量的維度;三、針對每個子量化器的量化數。我們簡要地討論這些因素對於設計乘積量化器的影響,並且設計出五大類的實驗來全面性地探討這些因素如何影響基於金字塔式梯度方向直方圖(PHOG)以及金字塔式局部二值模式(PLBP)描述符的物件識別以及笑容識別性能。最後我們將實驗結果總結成一些設定建議給相關研究使用。 藉由本次評估,我們提出了一些以前研究所沒有討論清楚的設定原則,並且提供研究者關於如何對各種圖像辨識建造更好乘積量化器的建議。

並列摘要


Measuring distances, e.g., Euclidean distances, between data points is a fundamental step for data clustering, classification, and retrieval. However, in many multimedia applications, high-dimensional distance measurement suffers from efficiency issues and the curse of dimensionality. Product quantization is an effective quantization scheme to deal with these issues, with that a high-dimensional space is decomposed into a Cartesian product of low-dimensional subspaces, and quantization in different subspaces is conducted separately. In this thesis, we introduce product quantization, and three important factors of product quantization: 1) the way to split vectors; 2) the dimension of subvectors; and 3) the number of quantization levels for each subquantizer. We briefly discuss these factors for designing a product quantizer, and then design five categories of experiments to comprehensively investigate how these factors influence performance of recognition applications, e.g., object recognition and smile recognition, based on pyramid of histogram of oriented gradients (PHOG) and pyramid of local binary pattern (PLBP) descriptors. Finally, we summarize these experimental results with some setting suggestions for related studies. By this evaluation we reveal design principles that have not been well investigated before, and provide these for researchers to design better product quantizers for various images recognition applications.

參考文獻


[1] Friedman, J.H., Bentley, J.L., and Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. on Mathematical Software (1977), vol. 3, no. 3, pp. 209-226.
[3] Jegou, H., Douze, M., and Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. on Pattern Analysis and Machine Intelligence (2011), vol. 33, no. 1, pp. 117-128.
[4] Chia, A.Y.-S., Rahardja, S., Rajan, D., and Leung, M.K.: Object recognition by discriminative combinations of line segments and ellipses. Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (2010), pp. 2225-2232.
[5] Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., and Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. on Pattern Analysis and Machine Intelligence (2000), vol. 22, no. 12, pp.1349-1380.
[8] Bai, Y., Guo, L., Jin, L., and Huang, Q.: A novel feature extraction method using pyramid histogram of orientation gradients for smile recognition. Proc. of IEEE International Conference on Image Processing (2009), pp.3305-3308.

延伸閱讀