以深度學習為基礎之視覺式行人危及行車安全程度評估系統

交通工具的進步使往返各地越來越方便，但也帶來許多交通事故。根據台灣內政部統計年報[4]統計，諸多交通事故造成台灣每年數十萬人傷亡，其中又以道路上弱勢行人受到的傷害為大。故本研究提出以深度學習為基礎之視覺式行人危及行車安全程度評估系統。該系統利用行車紀錄器影片作為輸入，用於主動式安全駕駛輔助，辨識行人危及行車安全程度，提前警告駕駛注意行人，希望能藉此降低車禍事故發生。本研究首先對行人危及行車安全程度進行分析以及定義，將行人危及行車安全程度依據行人與攝影機距離、行人在影像中位置、行人面朝方位、行人是否移動、是否逆光5種條件分為14種情況，並將這14種情況對應到安全、低危險、中危險、高危險共4種類別。之後，本系統使用YOLOv4類神經網路模型作為骨架，進行YOLOv4的組合測試，並以前、後處理的方式進行改良。本研究最終提出Single YOLOv4、Two-stage Training YOLOv4以及Parallel YOLOv4三種流程。Single YOLOv4直接以行人危及行車安全程度進行訓練，預測時加入後處理方法刪除過度重疊的預測框。Two-stage Training YOLOv4先針對影像中「人」進行訓練，再利用此權重學習行人危及行車安全程度，預測時利用第二階段學習到的權重進行預測並刪除過度重疊的預測框。Parallel YOLOv4訓練及測試時採用兩個YOLOv4，一YOLOv4以「人」進行訓練，一YOLOv4以行人危及行車安全程度進行訓練，預測時將兩個YOLOv4各自過度重疊的預測框刪除後合併預測結果。本研究使用的測試資料庫之影片皆為作者親自拍攝，拍攝地區為新北市中永和地區，命名為Pedestrian-Endanger-Driving-Safety Evaluation Dataset。本研究所開發的行人危及行車安全程度系統將輸出一段影片，影片中含有行人的預測框，預測框上方有預測之危及行車安全程度，並根據4種不同危及行車安全類別，給予每個預測框不同顏色，用於區分危及行車安全程度。本系統以F1-measure作為正確率評估方式，最終獲得71.2%的正確率。

關鍵字

行人偵測；神經網路；深度學習；駕駛輔助系統

並列摘要

The advancement of transportation makes travel around more and more convenient, but also brings many traffic accidents. According to Ministry Of The Interior statistics, many traffic accidents cause hundreds of thousands people injured every year. Among them, disadvantaged pedestrians on the road suffer the most. In this study, a vision-based pedestrian-endanger-driving-safety evaluation system using deep learning techniques is developed. The system using dashboard camera videos as input for advanced driver assistance systems, identifying pedestrian-endanger-driving-safety, and warning divers to pay attention on pedestrian. Hoping to reduce the occurrence of car accidents. In this study, fourteen situations of pedestrian-endanger-driving-safety are defined according to five conditions: pedestrian and dashboard camera distance, pedestrian’s position in the image, pedestrian facing direction, pedestrian is moving or not, pedestrian is back light or not. These fourteen situations correspond to four categories: safety, low degree of danger, medium degree of danger and high degree of danger. After that, the system uses YOLOv4 neural networks model as backbone to conduct a combined test on YOLOv4, and using pre-processing and post-processing methods to improved. This study finally proposes three structure: Single YOLOv4, Two-stage Training YOLOv4 and Parallel YOLOv4. Single YOLOv4 uses pedestrian-endanger-driving-safety for training, and uses post-processing to remove overlapping prediction boxes during prediction. Two-stage Training YOLOv4 first uses “person” in images for training, then uses this weight to train pedestrian-endanger-driving-safety. During prediction, Two-stage training YOLOv4 uses the weight learned in the second stage and uses post-processing to remove overlapping prediction boxes. Parallel YOLOv4 uses two YOLOv4 during training and testing. One YOLOv4 uses “person” for training, the other YOLOv4 uses pedestrian-endanger-driving-safety for training. During prediction, two YOLOv4 remove overlapping prediction boxes and the results are merged. The videos in the dataset used in this study are all shoot by the author. The shooting area are in Yonghe Dist and Zhonghe Dist, New Taipei City. The dataset are named Pedestrian-Endanger-Driving-Safety Evaluation Dataset. The pedestrian-endanger-driving-safety evaluation system developed by this research will output a video contain a prediction box for pedestrian. Above the prediction box there is a predicted degree of endanger-driving-safety. According to four categories of endanger-driving-safety, each prediction box is given a different color to distinguish the degree of endanger-driving-safety. The system uses F1-measure as evaluation method, and finally obtains 71.2% of correct rate.

並列關鍵字

Pedestrian detection ； Neural network ； Deep learning ； Advanced Driver Assistance System

參考文獻

[Boc20] A. Bochkovskiy, C. Wang, and H. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv:2004.10934 [cs.CV], 2020.

Google Scholar

[Red17] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 6517-6525, 2017.

Google Scholar

[Red18] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv:1804.02767 [cs.CV], 2018.

Google Scholar

[Red16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 779-788, 2016.

Google Scholar

[Wan15] C. Wang, H. Mark Liao, Y. Wu, P. Chen, J. Hsieh, and I. Yeh, “CSPNet: A New Backbone that can Enhance Learning Capability of CNN,” Proceedings of 2020 Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, pp. 1571-1580, 2020.

Google Scholar

國際替代計量

以深度學習為基礎之視覺式行人危及行車安全程度評估系統

全文下載

主題瀏覽