視覺型同時定位與建圖系統及其在FPGA上的實現

本論文針對機器人同時定位與建圖之問題，提出了一個基於線性模型之視覺型同時定位與建圖(Visual Simultaneous Localization and Mapping, V-SLAM)系統，並設計FPGA硬體加速電路，實現一個低成本、低功耗及高運算效率的系統，讓機器人行走在未知環境的過程中，能即時地建立三維環境地圖，同時估測自己在地圖中的狀態。基於線性模型之V-SLAM系統利用SIFT演算法的優勢偵測影像上的特徵點，並利用特徵點的資訊與key-frame選擇機制避免不必要的運算量，而地標管理則負責濾除不可靠的地標，使得攝影機相對狀態估測演算法能夠穩定地估測相對於前一時刻之旋轉與位移矩陣。為了建立完整的三維特徵地圖，本論文提出之一個線性方程式，讓地標能夠以二次收斂的速度更新其狀態，再藉由定位的線性方程式估測攝影機的絕對狀態。當機器人再次造訪先前看過的景象時，本論文基於線性模型描述先前之影像與當前影像的相似度，並利用離群權重函數濾除離群影像，以正確地偵測loop closure，使得機器人能進一步透過改良型軌跡校正演算法校正每一個攝影機及地標狀態，以提供更精準的定位與建圖結果。另外，基於硬體加速電路平行處理的優勢，本論文將此系統實現在低階的FPGA平台上，以快速地提供機器人的狀態及環境地圖，其中的One-Sided Hestenes-Jacobi演算法便是本論文設計之模組之一，用以實現奇異值分解模組。為了驗證本論文提出之V-SLAM系統，本論文透過軟體模擬實驗、利用RGB-D攝影機在小規模之室內環境的實驗以及利用著名的KITTI資料庫提供雙眼視覺在室外大環境的實驗等，與既有之文獻相互比較，而實驗結果可發現，基於線性模型之V-SLAM系統能夠穩定地提供精準的定位結果，且地標更新演算法也確實能建立較為完整的三維地圖，此外，利用查準率與查全率曲線也可發現，本論文提出的loop closure偵測演算法能正確地偵測loops。此外，在硬體電路之實驗中，本論文利用實際環境的特徵點資訊，加以驗證硬體之效果；從實驗結果可知，相較於一般電腦的運算速度而言，FPGA在定位與建圖分別加速了約350倍與460倍的運算時間，顯示本論文之V-SLAM系統可在低階、低成本、低功耗的平台上達到即時進行同時定位與建圖的效果。

關鍵字

視覺型同時定位與建圖；攝影機旋轉與位移矩陣；地標更新； loop closure ；軌跡校正演算法； One-Sided Hestenes-Jacobi 演算法； FPGA

並列摘要

In this paper, a visual simultaneous localization and mapping probem (V-SLAM) is addressed by proposing a V-SLAM system based on linear models. Moreover, to develop a low-cost, low power comsuming, and high computational efficiency of a V-SLAM system, an FPGA-implmentation for the proposed approach is established. The proposed V-SLAM system employs SIFT feature detection and description algorithm to extract features from an image, which are subsequently used to decide whether the input image is a key-frame or not. Furthermore, map management is proposed to filter out unstable landmarks such that relative camera pose estimation can be estimated reliably. To build a consistent 3D map, landmarks are updated using an iterative linear equation which is sublinearly convergent, where the updated landmarks are introduced to estimate absolute camera pose according to a linear model. To detect any potential loop closure, another linear model is designed to describe the similarity between the previous-seen images and the current one so that looped key-frame can be found successfully. If a loop is detected, an improved trajectory bending algorithm is therefore subsequently employed to revise the states of a camera as well as landmarks. Inherited from the superiorities of parallel computation, an FPGA-implementation of the proposed V-SLAM system is developed, where One-Sided Hestenes-Jacobi algorithm is designed to provide singular value decomposition of a matrix. To verify the proposed system, exhausted simulations and experiments are introduced, where indoor small-scale as well as outdoor large-scale environments are provided. The former uses an Xtion RGB-D camera, while the latter is by means of a KITTI public dataset using stereo vision. Compared to the existing methods, the proposed approach shows unprecedent estimations according to experimental results. As for the design of hardware implementations, features from an indoor environment are provided to verify the effectiveness of the system. Experimental results show that the required computational time using FPGA is approximately 350 and 460 times faster than using a normal PC in terms of localization and mapping, respectively.

並列關鍵字

visual simultaneous localization and mapping ； localization ； map building ； loop closure ； trajectory bending ； One-Sided Hestenes-Jacobi algorithm ； FPGA

參考文獻

[1] H.-D. Whyte and T. Bailey, “Simultaneous Localization and Mapping: Part I,” IEEE Robotics and Automation Magazine, Vol. 13, No.2, pp. 99-110, 2006.

Google Scholar

[2] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “FastSLAM: A factored solution to the simultaneous localization and mapping problem,” National Conference on Artificial Intelligence, Edmonton, July, 2002, pp. 593-598.

Google Scholar

[3] G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard, “A Tutorial on Graph-Based SLAM,” IEEE Intelligent Transportation Systems Magazine, Vol. 2, No. 4, pp. 31-43.

Google Scholar

[4] H. Strasdat, J.-M.-M. Montiel, and A.-J. Davison, “Visual SLAM: Why filter?,” International Journal of Image and Vision Computing, Vol. 30, No. 2, pp. 65-77, 2012.

Google Scholar

[5] K. Konolige and M. Agrawal, “FrameSLAM: From Bundel Adjustment to Real-Time Visual Mapping,” IEEE Transactions on Robotics, Vol. 24, No. 5, pp. 1066-1077, 2008.

Google Scholar

主題瀏覽