使用單目相機對已知高度物體深度預測

距離預測技術在未來智慧城市中扮演重要的角色，透過預測物體的距離，可以發展出各式各樣的應用，包含自動駕駛系統、車輛定位和街景圖資的更新等，在這些應用中，距離預測都是不可或缺的技術。在現有的距離預測技術中，有一類方法採用感測器如無線電雷達(Radar)或是光學雷達(Lidar)，藉由對周圍物體發出光或電磁波來預測距離，這類方法雖然精準且快但是成本較為昂貴。另種方法則是使用影像等低成本的方式，再經由強大的演算法來預測出影像中物體的距離。在本篇論文中，我們嘗試透過單一鏡頭相機(Monocular Camera)來估計物體距離。此類問題可被細分為兩個種類包含地面上物體和相機的距離，以及非地面物體和相機的距離，像是預測紅綠燈的距離。大多數現有的研究專注在預測地面上物體的距離且這些研究的結果都很準確與穩定。而第二類的問題通常需要額外的資訊才能被解決，因此我們提出使用目標物體的真實高度與相機成像模型(Camera Imaging Model)進而推導出物體在影像中的座標與此物體對應之地面點座標的關係。透過這個關係，我們可以推出任何已知高度的物體之地面座標，進而利用現有處理第一類問題的演算法來預測此地面物體與相機的距離。透過本文所提出的演算法，不但可以用低成本且即時的方式來預測出單一影像中已知高度的物體之距離，此方法更具有資料獨立性，也就是效能不受使用的資料不同而影響。實驗結果顯示我們的方法在三個不同環境中的整體平均絕對百分比誤差(Mean Absolute Percentage Error)為8%，在估計非地面物體之距離比基於學習(Learning-based)的模型好30%。但是我們的方法無法使用在預測低高度物體上，且需要一個額外的物件偵測模型來獲得物體在影像中的位置。

關鍵字

單一影像距離預測；相機幾何；視角轉換

並列摘要

Distance estimation technology is an essential component of future smart cities. It can be used to enable a variety of applications such as autonomous driving systems, and map updating. There are various methods for distance estimation, including using active sensors like radar and lidar, which emit waves or light to measure the distance to surrounding objects. These methods are generally accurate and fast, but they can also be expensive. Alternatively, low-cost techniques such as using images can be combined with powerful algorithms to estimate the distance of objects in the image. In this thesis, we try to estimate the distance of objects through a monocular camera. This problem can be divided into two categories. One is estimation of distance between the objects on the ground and the camera. The other is the estimation of distance between the objects above the ground and the camera, such as traffic lights. Most of the existing researches focus on the former and the estimation is quite accurate and stable. Solving the latter problem usually requires additional information such as the object size. Our method utilizes the height of the target object and camera imaging model to derive the relationship between the coordinate of an object in the image plane and its projection point on the ground. By this relationship, we can calculate the coordinates of the object’s projection point on the ground. Then, the existing algorithm that is used in the first type of problems is adopted to estimate the distance of the projection point. By using our proposed method, the distance of an object with a known height in the monocular image can be predicted in a low-cost and real-time way. Moreover, our method is data-independent. The experimental results show that the overall mean absolute percentage error of our proposed method in three different environments is 8%, which outperforms the learning-based model by 30%. However, our method cannot be used to estimate the distance of objects with a low height and needs an additional object detection model to obtain the position of an object in the image.

並列關鍵字

monocular distance estimation ； camera geometry ； perspective transform

參考文獻

Thalen, J.P. (2006). ADAS for the Car of the Future (Bachelor's thesis, University of Twente).

Google Scholar

Howard, I. P., & Rogers, B. J. (1995). Binocular vision and stereopsis. Oxford University Press, USA.

Google Scholar

Koller, D., Luong, Q. T., & Malik, J. (1994, October). Using binocular stereopsis for vision-based vehicle control. In Proceedings of the Intelligent Vehicles' 94 Symposium (pp. 237-242). IEEE.

Google Scholar

Uttamchandani, D. (Ed.). (2013). Handbook of MEMS for wireless and mobile applications. Elsevier.

Google Scholar

“What is lidar?” https://velodynelidar.com/what-is-lidar/

Google Scholar

國際替代計量

使用單目相機對已知高度物體深度預測

全文下載

主題瀏覽