使用行車紀錄器影像及駕駛視線資訊之自動道路物件危險程度分級

根據台灣的統計資料顯示，與機械故障和其他原因相比，駕駛者的疏忽是發生交通事故的最常見原因。許多先進駕駛輔助系統因而被設計出來，幫助駕駛在複雜的交通狀況下做出更適當的決策。一種可行的解決方法是整合各種車載感測器收集的資訊，進而警示駕駛者並增進他的視線範圍，利用這些警示，駕駛者可以因而注意到相關的物件並以較低的延遲來做出反應。在這篇論文中，我們採用深度學習的方法來解決這個問題。基於行車記錄器影像和駕駛者的視線資訊，我們設計了一個卷積神經網路，去對影像中所偵測到的物件進行危險程度的分級。我們透過分析 RGB 影像以及意義分割影像，分別取得目標物件的外觀以及周遭資訊，以進行後續的分類。在系統的實作上我們使用了 DR(eye)VE 資料集 [13] ，該資料集包含了在義大利所收集的行車記錄器影片以及當下駕駛者的視線資訊，我們手動選擇了具有潛在危險的影片片段，對偵測到的物件真實危險等級進行標記，並使用這些資料來訓練以及評估我們的模型。我們的模型最後能夠達到 89% 的整體準確率，並且能夠偵測出 80% 真正屬於危險類別的物件。此外，我們也收集了台灣本地的行車記錄器影像資料集來分析模型在不同道路環境的通用能力。最後，簡單的使用者體驗實驗結果顯示，在使用了我們的系統後，駕駛者能夠多看到約 20% 需要被注意的物件。

關鍵字

危險程度分級；卷積神經網路；行車記錄器影像分析；駕駛視線

並列摘要

Driver negligence has been reported as the most common cause of road accidents in Taiwan, resulting in more crashes every year than mechanic malfunction and other major accident causes. Many Advanced Driver Assistance Systems (ADAS) are designed to help drivers to make better driving decisions in complicated traffic scenarios with shorter latency. One possible solution is to process information collected by various onboard sensors and offer alerts or warnings to drivers augmenting its field-of-view. With this annotation, the drivers can pay attention to relevant objects and respond to hazardous situations with lower latency. In this work, we tackle the problem with a deep learning approach. Based on dashcam video frame and the gaze information of a driver, we design a two-branch Convolutional Neural Network (CNN) for categorizing the discretized hazardous level of each object detected in a frame. Our CNN model learns to capture the appearance (e.g., orientation) and proximity (e.g., relation between an object and its surrounding environment) information of an object through the RGB frame and the segmentation frame respectively. Evaluation is performed with the DR(eye)VE dataset [13], which contains dashcam videos along with the driver’s gaze in Italy. We manually pick several potentially dangerous video clips, annotate the ground truth hazardous level of detected objects and use them to train and evaluate our model. Our model achieves 89% overall accuracy and 80% recall on hazardous objects. Besides, it achieves 74% overall accuracy with local dash-cam videos dataset in Taiwan, which shows that the model is sufficiently general to apply to different road environments. Lastly, a simple user study is conducted and the result indicates that drivers could notice 20% more hazardous objects.

並列關鍵字

Hazardous Level Assessment ； Convolutional Neural Network ； Dashcam Video Analysis ； Driver Gaze

參考文獻

[1] 道路交通事故統計, Jul 2020.

Google Scholar

[2] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft. Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP), pages 3464–3468, 2016.

Google Scholar

[3] F.-H. Chan, Y.-T. Chen, Y. Xiang, and M. Sun. Anticipating accidents in dashcam videos. volume 10114, pages 136–153, 03 2017.

Google Scholar

[4] C. Chiou, W. Wang, S. Lu, C. Huang, P. Chung, and Y. Lai. Driver monitoring using sparse representation with part-based temporal face descriptors. IEEE Transactions on Intelligent Transportation Systems, 21(1):346–361, 2020.

Google Scholar

[5] V.E.Dahiphale and S.R.Rao. A review paper on portable driver monitoring system for real-time fatigue. In 2015 International Conference on Computing Communication Control and Automation, pages 558–560, 2015.

Google Scholar

國際替代計量

使用行車紀錄器影像及駕駛視線資訊之自動道路物件危險程度分級

全文下載

主題瀏覽