透過您的圖書館登入
IP:3.16.90.182
  • 學位論文

應用於3D道路物件偵測基於圖片及點雲之區域融合網路

RF-Net: Regional Fusion Network for 3D On-road Object Detection Based on RGB Images and Point Clouds

指導教授 : 傅立成
共同指導教授 : 蕭培墉(Pei-Yung Hsiao)

摘要


近年來,在自動駕駛領域的相關研究及發展日趨流行且成熟,且自動駕駛決策的整體表現也因為硬體和軟體技術的增長而變得更加的準確,許多科技公司開始著手於開發先進駕駛輔助系統(ADAS),而3D物件偵測是先進駕駛輔助中最重要且不可或缺的一環,有了3D物件偵測的幫助,系統可以準確的定位到道路上的障礙物並進行最正確的決策。在現今的自動駕駛汽車上,大部分裝的感測器為相機以及光達(LiDAR),有鑒於這兩種感測器先天上具有各自的優缺點,本研究旨在提出一個利用資料融合之方法去結合兩種感測器的優點的3D物件偵測網路。 本研究中,首先分析了現今最先進的3D物件偵測方法,這些方法皆以深度學習和類神經網路去習得RGB影像和LiDAR點雲中具代表性的特徵去進行預測,然而,這些方法對於點雲的資料都會先進行資料壓縮成鳥瞰圖(BEV),再輔以傳統的卷積神經網路進行特徵擷取。基於此觀察,本研究提出一個新穎的3D物件偵測網路,此網路是直接採以原始的點雲資料作為輸入以保留資料的原始性,此外,也提出了基於感興趣區域之資料融合方法,藉由區域上的資料融合,可以減少掉花費在不感興趣區域融合的時間成本以增加運行速度。 為了驗證本研究提出的方法,我們於目前公認最具挑戰性的3D物件偵測數據集KITTI資料庫去進行評估,評估的結果顯示本研究的平均準確度超過80%。

並列摘要


Over the past few years, the research and development of autonomous driving technology has been prevalent, and the performance has been significantly improved on both hardware and software. Especially, 3D object detection is an indispensable key technique to autonomous driving. This thesis targets at proposing a 3D object detector for detecting on-road vehicles, taking both LiDAR point clouds and RGB images as inputs and providing accurate 3D bounding boxes for the vehicles. In this thesis, we present a novel two-stream fusion-based 3D object detection network, called Regional Fusion Network (RF-Net), which includes multi-scale feature aggregation module and regional fusion layer to provide region-of-interest-level (RoI-level) fusion between RGB images and LiDAR cameras. The salient feature of our work is that RF-Net uses raw LiDAR point clouds directly as the input without any quantization process to avoid loss of information. Firstly, the rough estimations of foreground objects are generated through both LiDAR stream and RGB stream simultaneously. Our proposed multi-scale feature aggregation module is leveraged to exploit both high-level and low-level RGB features to capture objects from small-size to large-size. In addition, the proposed regional fusion layer utilizes point-wise features from the LiDAR stream and multi-scale spatial features from the RGB stream to generate the fully fused features for further 3D box refinement. Experimental results on the challenging KITTI Vision Benchmark show that the proposed RF-Net outperforms other state-of-the-art methods in mean average precision (mAP). Also, the ablation studies demonstrate that our approaches can improve the quality of 3D object detection.

參考文獻


[1] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91-99.
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[3] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[4] P. C. Ng and S. Henikoff, "SIFT: Predicting amino acid changes that affect protein function," Nucleic acids research, vol. 31, no. 13, pp. 3812-3814, 2003.
[5] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), 2005, vol. 1, pp. 886-893.

延伸閱讀