透過您的圖書館登入
IP:18.217.210.147
  • 學位論文

High-Performance SIFT (Scale Invariant Feature Transform) Hardware Design for Efficient Image Feature Extraction

採用 SIFT 演算法之高效能影像特徵點擷取硬體設計

指導教授 : 黃錫瑜

摘要


Feature extraction algorithm is one of the interest classes in Computer Vision techniques in recently years. To extract the robust features, these algorithms require very high computational complexity so that the performance is far from real-time on desktop computers. Scale-Invariant Feature Transform (or SIFT) is one of the state-of-the-art algorithms that currently exist in this domain. In this work, we proposed the first All-Hardware SIFT Accelerator as we know, this hardware design based on TSMC 0.18 um CMOS technology. The proposed architecture consists of two interactive hardware components, one for key point identification, and the other for feature descriptor generation. The usages of segment algorithm and buffer scheme not only provide efficiently data for computing modules, but also reduced about 49.88 % memory requirement than a recent work. With the three-staged pipeline and parallel architecture, the performance of key point identification is about 3.4 milliseconds for one VGA image. Furthermore, including the feature descriptor generation, the total operation time for a VGA image is within 33 milliseconds when the number of features is fewer than about 890.

並列摘要


在電腦視覺(Computer Vision)此技術領域中,特徵點擷取(feature extraction)演算法是近年來備受關注的一項。為了要從影像中擷取出穩定的(robust)特徵點,這些特徵點擷取演算法,大都具有相當大的複雜度。因而,這些演算法在桌上型電腦(desktop computers)執行的時間較長,而難以達成即時的運算(real-time)。在現存的特徵點擷取演算法中,Scale-invariant feature transform (or SIFT)被認為是其中最頂尖的一個演算法。就我們所知道的,在這篇論文裡面,我們提出了第一個全硬體化的SIFT加速器設計,此硬體設計是基於TSMC 0.18 um 之標準單元庫來實現此電路。我們提出的架構由兩個交互運作(interactive)的硬體模組所組成,一個是針對於特徵點辨認(key point identification),另一個是用於特徵點描述(feature descriptor)的產生。我們所使用的片段演算法(segment algorithm)以及片段暫存策略(segment buffer scheme),可以有效率的提供輸入資料給予各個硬體模組,並且相較於現存的方法,可以減少大約49.88 %的記憶體使用量。配合三級管線化(three-staged pipeline)以及平行化的架構設計,對於一張VGA大小的影像,其特徵點辨認所需要的時間大概只需要3.4毫秒(milliseconds)。更進一步的,當VGA影像的特徵點數量小於890個,包含特徵點描述的產生在內的總共運算時間可以在33毫秒內完成,而此效能足夠支援影片(video)的及時運算。

參考文獻


[4] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[5] S. Se, D. G. Lowe, and J. J. Little, “Vision-based global localization and mapping for mobile robots,” IEEE Transactions on Robotics, vol. 3, no. 21, pp. 364–375, 2005.
[6] M. Brown and D. G. Lowe, "Recognising panoramas," in Proceedings of Ninth IEEE International Conference on Computer Vision, pp. 1218 - 1225, 2003.
[7] D. G. Lowe, "Object recognition from local scale-invariant features," in Proceedings of the Seventh IEEE International Conference on Computer Vision, pp. 1150–1157, 1999.
[8] D. G. Lowe, "Local feature view clustering for 3D object recognition," in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. I-682 - I-688, 2001.

延伸閱讀