透過您的圖書館登入
IP:3.144.71.142
  • 學位論文

基於深度學習預測交通意外及事故

Anticipating Accidents based on Deep Learning in Dashcam Videos

指導教授 : 孫民

摘要


目前我們所提出的深度學習模型(Dynamic-Spatial-Attention (DSA) Recurrent Neural Network (RNN))(如 Fig.1.1)來預測行車紀錄器中的車禍時間點,我們的模型能夠學習到(1)在一個時間點上的場景中哪個物體可能會比較危險並特別注意那幾個物體 (2) 並分析上下的時間點來觀察比較危險的物體有沒有可能發生車禍。預測車禍不像是預測行車行為(Ex:切換道路、轉彎)那麼簡單就能分析,因為車禍都是很突然的發生而且在路上不常發生,所以我們利用(1)最先進的物體偵測的演算法(Faster-RCNN [1])來偵測物體和物體追蹤的演法算(MDP [2])來追蹤物體 (2)並利用場景、物體的外觀和軌跡的資訊來預測車禍時間點。我們收集了 968 部(如(Fig. 5.1))有發生不同形式的車禍(Ex:機車撞機車、汽車撞機車...等)的台灣行車紀錄影片,並且每部影片都有標註發生車禍的時間點以及發生車禍的物體種類,所以我們可以利用這些資料來做監督式學習訓練並量化訓練後的結果。利用我們提出的模型能夠在1.22秒前會有 80% recall 和 46.92% precision 而且能夠達到最高 63.98\% mean average precision。

並列摘要


We propose a Dynamic-Spatial-Attention (DSA) Recurrent Neural Network (RNN) for anticipating accidents in dashcam videos (Fig. 1.1). Our DSA-RNN learns to (1) distribute soft-attention to candidate objects dynamically to gather subtle cues and (2) model the temporal dependencies of all cues to robustly anticipate an accident. Anticipating accidents is much less addressed than anticipating events such as changing a lane, making a turn, etc., since accidents are rare to be observed and can happen in many different ways mostly in a sudden. To overcome these challenges, we (1) utilize state-of-the-art object detector [1] and tracking-by-detection [2] to detect and track candidate objects, and (2) incorporate full-frame and object-based appearance and motion features in our model. We also harvest a diverse dataset of 968 dashcam accident videos on the web (Fig.5.1). The dataset is unique, since various accidents (e.g., a motorbike hits a car, a car hits another car, etc.) occur in all videos. We manually mark the time-location of accidents and use them as supervision to train and evaluate our method. We show that our method anticipates accidents about 1.22 seconds before they occur with 80% recall and 46.92% precision. Most importantly, it achieves the highest mean average precision (63.98\%) outperforming other baselines without attention or RNN.

參考文獻


[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in NIPS, 2015.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
[4] H. Wang and C. Schmid, “Action recognition with improved trajectories,” in ICCV, 2013.
[5] S. Hochreiter and J. Schmidhuber, “Long short-term memory.,” Neural Computation, 1997.
[6] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.

延伸閱讀