透過您的圖書館登入
IP:18.217.182.45
  • 學位論文

基於深度學習之深度估測與人車語意分割融合方法的實現及其在道路環境三維重建之應用

Implementation of a deep learning based depth estimation and pedestrian-vehicle semantic segmentation fusion method and its applications to 3D reconstruction of road environment

指導教授 : 蔡奇謚

摘要


在自駕車的系統開發領域中,車輛的周圍環境感知技術是關鍵的核心之一,其不但需要量測障礙物的空間位置,也需要判斷障礙物的種類,如此才能確保自駕車對於周遭環境的理解並執行安全的避障行為。為了達到車輛環境感知的目的,現今的自駕車系統大多以光達及雷達的資訊整合來獲得周圍物體的空間位置資訊,但此作法不但成本昂貴,且無法提供周圍物體類別,例如行人或車輛物體。本論文的目的即提出一個以深度學習為基礎的視覺深度估測及語意分割系統,其透過單眼攝影機所提供的輸入影像來進行深度估測及行人與車輛的物件分割處理,並融合此兩資訊來求得車輛周圍人與車輛的三維空間資訊,使得自駕車能在安全範圍內做出正確的避障策略。本論文所提出的方法不但可以輔助自駕車系統進行道路上的物件深度估測與辨識,也可達到降低感測系統成本的目的。此外,本論文因需要台灣道路數據及作為深度估測網路的訓練資料,我們使用了Zed立體視覺攝影機實地在台灣道路拍攝進行訓練與測試數據集的收集。在語意分割網路的訓練上則是使用Cityscape數據集,並只擷取了行人、汽車的標籤進行訓練。最後,統整了以上數據分別訓練並測試深度網路模型後,所提出系統可順利獲得道路上人車辨識結果及重建出相對於相機的三維空間資訊。

並列摘要


In the field of system development of self-driving vehicles, the technology of surrounding environment sensing of the vehicle is one of the key cores. It not only needs to measure the spatial position of the obstacle, but also needs to recognize the type of the obstacle. In this way, we can ensure the self-driving vehicle understanding of the surrounding environment and implement safe obstacle avoidance behavior. To achieve the purpose of road environment perception, most of the modern self-driving systems fuse LIDAR and radar information to obtain spatial position information of surrounding objects. However, this method not only is expensive, but also cannot recognize object categories, such as pedestrians or vehicles. The purpose of this thesis is to propose a deep learning based depth estimation and semantic segmentation system, which uses a single image to perform depth estimation and pedestrian-vehicle semantic segmentation processes and fuses both information to reconstruct 3D information of the pedestrians and vehicles around the self-driving vehicle. The proposed method can not only assist the self-driving system to estimate and identify the object distance and object type on the road, but also reduce the cost of the sensing system. In addition, due to the requirement for Taiwan on-road scene dataset to train the depth estimation network, we used the Zed stereo camera to collect training and testing datasets in Taiwan. In the training of the semantic segmentation network, the Cityscape dataset was used, and only the labels of pedestrians and cars were used in training. After training and testing the deep neural network model using our-own dataset, the proposed system can successfully obtain the identification results of pedestrians and vehicles on the road and reconstruct the 3D information of these objects relative to the camera.

參考文獻


[1] R. Memisevic and C. Conrad. Stereopsis via deep learning. In NIPS Workshop on Deep Learning, volume 1, 2011.
[2] F. H. Sinz, J. Q. Candela, G. H. Bakır, C. E. Rasmussen, and M. O. Franz. Learning depth from stereo. In Pattern Recognition, pages 245–252. Springer, 2004.
[3] R. Szeliski. Structure from motion. In Computer Vision, Texts in Computer Science, pages 303–334. Springer London, 2011.
[4] R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah. Shape-from-shading: a survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 21(8):690–706, 1999.
[5] S. Suwajanakorn and C. Hernandez. Depth from focus with your mobile phone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

延伸閱讀