隨著車輛工業的發展,物件偵測的技術廣泛地被應用在路上。現有的相機物件偵測網路可以在晴朗天氣達到很好的效果。然而在戶外的環境中常常出現意料之外的壞天氣。這時我們便需要使用額外的傳感器(例如:光達、雷達)來互補以對抗惡劣天氣。最近,多模態混合的問題受到越來越多的關注,然而過去的工作並沒有考慮到車輛的傳感器的資料型態和適用的情境。因此我們提出了一個新的端到端多模態多階段物件偵測網路稱做MT-DETR(MulTimodal MulTistage DETR)。相較於單模態的物件偵測網路,它多了混合模塊和強化模塊,並且使用分層混合機制。我們使用信心混合模塊(Confidence Fusion Module)和殘差混合模塊(Residual Fusion Module)來混合相機、光達、雷達、時間,並且使用殘差強化模塊(Residual Enhancement Module)來強化各個分枝。為了確保並強化各個分枝的有效性,我們採用了多階段損失。最後,我們改良了霧圖片的合成方法,並且使用有霧的相機-光達資料對作為訓練資料來增強模型在惡劣天氣中的表現。MT-DETR在STF的測試資料集上的表現超越了過去的方法。另外,我們也替換MT-DETR的特徵提取器,MT-DETR在各個實驗中的表現都高過基線方法,證實了MT-DETR的泛用性和可擴展性。
Due to the need for autonomous driving, object detection is widely used on the road. Existing camera-based object detection networks perform well in clear weather. Unfortunately, unexpected adverse weather sometimes occurs in outdoor environments. In this case, we need additional sensors (e.g., lidar, radar) to help the camera adapt to bad weather. Multimodal fusion has received increasing attention recently. However, previous work did not consider vehicle sensors' data types and behaviors. Therefore, we propose a novel end-to-end multimodal multistage object detection network called MT-DETR. Compared with the unimodal target detection network, MT-DETR adds fusion modules and enhancement modules and adopts a hierarchical fusion mechanism. We propose Confidence Fusion Module (CFM) and Residual Fusion Module (RFM) to fuse camera, lidar, radar, and time features and present Residual Enhancement Module (REM) to strengthen each unimodal branch. In addition, we introduce a multistage loss to strengthen the effectiveness of each branch. Finally, we improve the foggy image synthesis method and utilize foggy camera-lidar data pairs as training data to improve the model's performance in unseen adverse weather. Extensive experiments on clear, light fog, dense fog, snow splits of the STF dataset demonstrate that MT-DETR outperforms previous state-of-the-art methods. Furthermore, we replace the feature extractor of MT-DETR and prove the generality of MT-DETR.