透過您的圖書館登入
IP:18.223.169.109
  • 學位論文

應用於多樣天氣條件下之基於信心度混合的多模態物件偵測神經網路

Confidence Fusion-based Multimodal Object Detection Network in Various Weather Conditions

指導教授 : 李明穗
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著車輛工業的發展,物件偵測的技術廣泛地被應用在路上。現有的相機物件偵測網路可以在晴朗天氣達到很好的效果。然而在戶外的環境中常常出現意料之外的壞天氣。這時我們便需要使用額外的傳感器(例如:光達、雷達)來互補以對抗惡劣天氣。最近,多模態混合的問題受到越來越多的關注,然而過去的工作並沒有考慮到車輛的傳感器的資料型態和適用的情境。因此我們提出了一個新的端到端多模態多階段物件偵測網路稱做MT-DETR(MulTimodal MulTistage DETR)。相較於單模態的物件偵測網路,它多了混合模塊和強化模塊,並且使用分層混合機制。我們使用信心混合模塊(Confidence Fusion Module)和殘差混合模塊(Residual Fusion Module)來混合相機、光達、雷達、時間,並且使用殘差強化模塊(Residual Enhancement Module)來強化各個分枝。為了確保並強化各個分枝的有效性,我們採用了多階段損失。最後,我們改良了霧圖片的合成方法,並且使用有霧的相機-光達資料對作為訓練資料來增強模型在惡劣天氣中的表現。MT-DETR在STF的測試資料集上的表現超越了過去的方法。另外,我們也替換MT-DETR的特徵提取器,MT-DETR在各個實驗中的表現都高過基線方法,證實了MT-DETR的泛用性和可擴展性。

並列摘要


Due to the need for autonomous driving, object detection is widely used on the road. Existing camera-based object detection networks perform well in clear weather. Unfortunately, unexpected adverse weather sometimes occurs in outdoor environments. In this case, we need additional sensors (e.g., lidar, radar) to help the camera adapt to bad weather. Multimodal fusion has received increasing attention recently. However, previous work did not consider vehicle sensors' data types and behaviors. Therefore, we propose a novel end-to-end multimodal multistage object detection network called MT-DETR. Compared with the unimodal target detection network, MT-DETR adds fusion modules and enhancement modules and adopts a hierarchical fusion mechanism. We propose Confidence Fusion Module (CFM) and Residual Fusion Module (RFM) to fuse camera, lidar, radar, and time features and present Residual Enhancement Module (REM) to strengthen each unimodal branch. In addition, we introduce a multistage loss to strengthen the effectiveness of each branch. Finally, we improve the foggy image synthesis method and utilize foggy camera-lidar data pairs as training data to improve the model's performance in unseen adverse weather. Extensive experiments on clear, light fog, dense fog, snow splits of the STF dataset demonstrate that MT-DETR outperforms previous state-of-the-art methods. Furthermore, we replace the feature extractor of MT-DETR and prove the generality of MT-DETR.

參考文獻


[1] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016.
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
[3] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
[5] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.

延伸閱讀