透過您的圖書館登入
IP:3.141.24.134
  • 期刊

數據不平衡下以機器學習方法預測交通事故嚴重性之分析

MACHINE LEARNING METHODS FOR TRAFFIC ACCIDENT SEVERITY PREDICTION UNDER IMBALANCED DATA

摘要


降低事故的嚴重程度是近年來全世界努力的方向,全球已經發展出許多被動式安全系統來減緩事故嚴重程度,如安全帶、安全氣囊、煞車輔助系統等等,建立預測事故嚴重性的模型也是許多學者研究的目標,近年來機器學習以及深度學習的方法取代統計方法,可以達到較高的準確度以及運算效率,然而進行模型訓練時需要大量的數據,但肇事資料庫中存在著數據不平衡的問題,因此如何處理這種狀況將是一項重要的課題。本研究將交通事故嚴重性分為死亡、受傷、未受傷三個等級,為多元分類問題,並收集臺南市的公開資料庫且利用過採樣以及欠採樣兩種資料預處理的方法,對於不平衡的數據進行重新採樣,分別使用SMOTE和Cluster Centroid這兩種演算法去進行;在模型訓練的部分,採用基於集成學習(Ensemble Learning)的兩種分類模型,本文使用Random Forest和Catboost這兩種演算法來進行兩種集成的學習,研究結果顯示,在欠採樣及過採樣的資料中,兩種模型分別都有97.69%以及86.84%以上的準確度,此結果未來可以應用於自駕車上或是給予相關單位作為制定決策時的一些證據。

並列摘要


Reducing traffic accident severity is an effective approach to improve road safety. To decrease traffic severity, there are many passive safety systems like safety belts, airbags, brake assist systems and so on. In recent years, building models to predict traffic accident severity is also the subject that many researchers focus on. There are a lot of machine learning and deep learning approaches instead of statistical methods. They can get higher accuracy and faster calculate speed. It needs large datasets to train the model, but there is usually an imbalanced data problem in the datasets. Therefore, it must preprocess these sets. This study divides the traffic accident severity into three levels: death, injury, and non-injury. It is a multi-class classification problem. We collect data from Tainan open datasets and utilize over-sampling and under-sampling methods to resample the imbalanced data. To implement the resample process, we apply SMOTE and Cluster Centroid algorithms separately. We apply two classification models based on the ensemble learning to train the model. This study uses Random Forest and Catboost to execute the two ensemble learning methods. The research results denote that these two models have more than 97.69% and 86.84% accuracy separately in the under-sampling and over-sampling datasets. This result can apply in autonomous vehicles in the future or provide related apartments some suggestions for making the decision.

參考文獻


Nakai, H., & Usui, S., “How Do User Experiences with Different Transport Modes Affect the Risk of Traffic Accidents? From the Viewpoint of Licence Possession Status”, Accident Analysis and Prevention, 99(Pt A), 2017, pp. 242-248.
Fan, Z., Liu, C., Cai, D., & Yue, S., “Research on Black Spot Identification of Safety in Urban Traffic Accidents Based on Machine Learning Method”, Safety Science, Vol.118, 2019, pp. 607-616.
Osman, O. A., Hajij, M., Karbalaieali, S., & Ishak, S., “A Hierarchical Machine Learning Classification Approach for Secondary Task Identification from Observed Driving Behavior Data”, Accident Analysis and Prevention, Vol.123, 2019, pp. 274-281.
Parsa, A. B., Movahedi, A., Taghipour, H., Derrible, S., & Mohammadian, A. K., “Toward Safer Highways, Application of XGBoost and SHAP for Real-Time Accident Detection and Feature Analysis”, Accident Analysis & Prevention, Vol.136, 15405, 2020.
Zhang, Z., He, Q., Gao, J., & Ni, M., “A Deep Learning Approach for Detecting Traffic Accidents from Social Media Data”, Transportation Research Part C: Emerging Technologies, Vol.86, 2018, pp. 580-596.

延伸閱讀