自主移動機器人之實時視覺定位與不確定性估測系統

本論文提出了實時智慧型載具視覺定位系統。想法來自人類如何使用可見的地標確定其位置。智慧型載具包括自動駕駛汽車和自主移動機器人，兩者都需要實時，準確且高強健性的定位系統。自主移動機器人通常依賴於二維光學雷達，但是在許多情況下，例如走廊，二維光學雷達無法獲得足夠的特徵或地標進行定位。而相機可以獲取不同的特徵，例如告示板、水管、燈，甚至是遠方的消失點。這些特徵是建築物與生俱來的，並且是有用的定位特徵。因此，添加相機功能將是提高定位性能的合理方法。另外，自動駕駛汽車通常是利用三維光學雷達或 GPS/GNSS 來定位。儘管基於三維光學雷達的定位算法可提供準確的定位結果，但光達在實際應用中會遇到初始位置問題（或全局定位問題）、綁架問題、高計算成本和高成本。本研究的視覺定位系統旨在為室內和室外應用提供低成本、可靠、實時且準確的解決方案。該系統訓練卷積神經網絡，以在端到端學習中從單個 RGB 圖像估計智能載具的姿態和不確定性，而無需進行額外的特徵工程或圖形優化。與粒子濾波和條件隨機場不同，大多數深度學習回歸模型中沒有評估不確定性的方法。該不確定性對於防止智慧車輛故障很重要。因此，本文提出了一種解決該問題的方法。本研究的視覺定位系統是通過用於嵌入式視覺應用的高效輕巧的深度卷積神經網絡來實現，驗證卷積神經網絡可用於解決複雜的視覺定位問題。實驗結果表明，我們的視覺定位系統可以在給定環境中全局重新定位，從而解決了迷路、綁架和初始定位問題。此外，該系統已通過實驗驗證，包括我們的室內自主移動機器人和公開的室內及室外資料集。對於大型室外場景，它的精度約為 1.74m 和 7.01°，而在室內，精度則為 2.7m 和 9.27°。它還可以在嵌入式系統（Nvidia Jetson Xavier）上實時運行，每幀花費 49ms 的時間進行計算（等同 20.27fps）。

關鍵字

視覺定位系統；深度卷積神經網路；不確定性估測

並列摘要

This thesis proposes a real-time autonomous intelligent vehicle visual localiza-tion system. The idea comes from how humans determine their location using visible landmarks. Autonomous intelligent vehicles include self-driving cars and autonomous mobile robots (AMR), both require real-time, accurate, and robust localization system. AMRs often rely on two-dimensional light detection and rang-ing (2D LiDAR). However, in many scenarios, such as corridors, the 2D LiDAR cannot get enough features or landmarks to localize. Alternatively, the camera can obtain different features, such as notice boards, pipes, ceiling lights, and even the vanishing lines. These features are born with the building and are useful features for localization. Thus, adding camera features will be a reasonable idea to improve localization performance. Self-driving cars often depend on the three-dimensional light detection and ranging (3D LiDAR) or GPS/GNSS. Although 3D LiDAR-based localization algorithms offer accurate localization results, they suffer from the initial pose problem (or global localization problem), kidnapped problem, high computation costs, and high costs during real applications. The proposed visual localization system aims to provide a low-cost, robust, real-time, and accurate solution for indoor and outdoor applications. The system trains a convolutional neural network to estimate the intelligent vehicles’ pose and the uncertainty from a single RGB image in end-to-end learning with no need for additional feature engineering or graph optimization. Unlike particle filtering and conditional random fields, there are no methods of evaluating uncertainty among most deep learning regression models. The uncertainty is significant to prevent an intelligent vehicle from failure. So, this paper proposes a new method to address this issue. The proposed system is achieved by using an efficient and lightweight deep convolutional neural network for embedded vision applications, demonstrat-ing that a convolution neural network can be used to solve complicated visual localization problems. The experiment results show that our visual localization system can globally relocalize within a given environment, which solves the lost, kidnapped, and initial pose problems. Also, it can operate real-time for both indoor and outdoor. Furthermore, the proposed system is verified through experiments, including our indoor AMR and public indoor and outdoor datasets. It achieves approximately 1.74m and 7.02 accuracy for large scale outdoor scenes and 0.31m and 9.27 accuracy indoors. It also operates in real-time, taking 49ms per frame to compute (20.27fps) on the embedded system (Nvidia Jetson Xavier).

並列關鍵字

Visual Localization System ； Deep Convolutional Neural Network ； Uncertainty Estimation

參考文獻

[1] A. Valada, N. Radwan, and W. Burgard, “Deep Auxiliary Learning for Visual Localization and Odometry,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 6939–6946, 2018.

Google Scholar

[2] N. Radwan, A. Valada, and W. Burgard, “VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4407–4414, 2018.

Google Scholar

[3] Y. Lin, Z. Liu, J. Huang, C. Wang, G. Du, J. Bai, and S. Lian, “Deep GlobalRelative Networks for End-to-End 6-DoF Visual Localization and Odometry,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019.

Google Scholar

[4] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for real-time 6-dof camera relocalization,” in Proceedings of the IEEE International Conference on Computer Vision, 2015.

Google Scholar

[5] F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers, “Image-Based Localization Using LSTMs for Structured Feature Correlation,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 627–637, 2017.

Google Scholar

國際替代計量

自主移動機器人之實時視覺定位與不確定性估測系統

不提供下載

主題瀏覽