應用於SLAM系統之具有改良式SIFT演算法的立體視覺及其在FPGA上的實現

本論文設計與實現一個立體視覺(Stereo Vision)尺度不變特徵轉換(Scale- Invariant Feature Transform, SIFT)的影像辨識系統，並經由場域可程式化邏輯陣列(Field Programmable Gate Array, FPGA)的硬體加速電路實現。可以應用於即時定位與地圖構建系統(Simultaneous Localization and Mapping, SLAM)中，有效的改善視覺型機器人在自主導航下所需要的影像匹配與地圖建立等議題。在所設計的視覺系統中，機器人能於未知的環境下，能以高運算效率的方式即時比對每張拍攝的影像畫面，匹配出雙眼視覺攝影機兩張影像畫面之間的共同特徵點，並利用雙眼視覺攝影本身的結構特性，計算出各個特徵點到實際攝影機的距離，達到精準匹配影像與距離估測的目標。　　本論文中，提出了新的梯度計算方法以及降低特徵描述子維度的方法，這可以大幅減少SIFT的硬體使用量及加快運算速度。此外，本論文也提出了一套立體匹配的方法，透過KITTI資料庫做為輸入影像，並使用對極幾何以及限制範圍的方法來完成立體匹配，並且完成深度的計算。本研究採用Altera的DE2i-150，操作頻率為50MHz，使用KITTI資料庫的立體影像，並擷取影像中心的640×370的大小作為輸入影像。在640×480的輸入影像中，SIFT有著205fps的影像更新率與54,911的邏輯元件使用量。在640×370的輸入影像中，立體視覺SIFT的影像辨識系統有著181fps的影像更新率及140,303的邏輯元件使用量。

關鍵字

立體視覺；影像辨識技術；尺度不變特徵轉換演算法；特徵匹配；場域可程式化邏輯陣列

並列摘要

This project proposed a stereo vision scale-invariant feature transform(SIFT) image recognition system with the auxiliary design of FPGA hardware acceleration circuit. It can be applied to the SLAM system to effectively improve the image matching and map establishment required by the vision robot under autonomous navigation. In the designed vision system, the robot can instantly compare each captured image frame with high computing efficiency in an unknown environment. Then, it matches the common feature points between the two image frames of the stereo vision. Finally, by using the structural characteristics of stereo camera, the distance between each feature point and the actual camera is calculated to achieve the goal of accurately matching the image and estimating the distance. In this paper, a new gradient calculation and a method to reduce the dimension of the feature descriptor is proposed to greatly reduce the hardware usage of SIFT and to speed up the calculation speed. Moreover, this paper also proposed a stereo matching method, which uses the KITTI database as the input image and uses Epipolar geometry and limited range methods to complete stereo matching and the depth calculation. In this project, we used Altera DE2i-150 and the operation frequency is 50MHz. Also, we used the stereo image from the KITTI database and captured the size of 640×370 from the center of the image as the input image. In the 640×480 input image, SIFT has an image frame rate of 205fps and a total logical element usage of 54,911. Among the 640×370 input images, the stereoscopic SIFT image recognition system has an image frame rate of 181fps and a total logical element usage of 140,303.

並列關鍵字

Stereo Vision ； Image Recognition ； SIFT ； Feature Matching ； FPGA

參考文獻

[1] C. Chien, C. J. Hsu, W. Wang, and H. Chiang, “Indirect Visual Simultaneous Localization and Mapping Based on Linear Models,” in IEEE Sensors Journal, vol. 20, no. 5, pp. 2738-2747, 1 March1, 2020.

Google Scholar

[2] D. G. Lowe, “Object recognition from local scale-invariant features,” Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, pp. 1150-1157 vol.2, 1999.

Google Scholar

[3] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91-110, 2004.

Google Scholar

[4] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “SUFT: Speeded Up Robust Feature,” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 309-432, June. 2008.

Google Scholar

[5] S. Leutenegger, M. Chli and R. Y. Siegwart, “BRISK: Binary Robust invariant scalable keypoints,” 2011 International Conference on Computer Vision, Barcelona, pp. 2548-2555, 2011.

Google Scholar

國際替代計量

應用於SLAM系統之具有改良式SIFT演算法的立體視覺及其在FPGA上的實現

主題瀏覽