隨著社群媒體的快速發展,大眾現在可以透過社群媒體接收到更快速、更多樣的資訊,不再局限於電視或廣播的新聞報導。同時,大眾也可以利用社群媒體提供資訊。由於社群媒體上的資訊具有高度能見度,因此社群媒體已成為企業、政府及各類組織不可忽視的資訊來源。本研究旨在利用社群媒體上大眾對行車紀錄器影片的評論,協助警方辨識交通事故的肇事成因。具體而言,本研究首先從YouTube上蒐集交通事故影片,並以人工方式區分為「人為因素肇事」、「車輛因素肇事」、「環境因素肇事」、「路況因素肇事」等四個肇事類別。接著,根據社群媒體上使用者對交通事故影片的評論,建立「人為因素肇事」、「車輛因素肇事」、「環境因素肇事」、「路況因素肇事」等四個肇事主因詞庫。最後,針對每部影片,將所有評論合併,計算出「人為因素計數」、「車輛因素計數」、「環境因素計數」、「路況因素計數」等四個特徵值,以此方式建立訓練及預測資料集。最終,本研究使用四種演算法,包括隨機森林演算法、決策樹演算法、貝式分類演算法、支援向量機演算法進行分類,以隨機森林演算法得出之準確率最高,高達95.4545%,利用及結合交通事故詞庫,有效地辨識出肇事主因及比較其演算法優缺點。
With the rapid development of social media, the public can now receive faster and more diverse information through social media, no longer limited to TV or radio news reports. At the same time, the public can also use social media to provide information. Due to the high visibility of information on social media, social media has become a source of information that companies, governments and organizations cannot ignore. This study aims to use public comments on dashcam videos on social media to help police identify the causes of traffic accidents. Specifically, this study first collected traffic accident videos from YouTube, and manually divided them into four types of accidents: "human accidents", "vehicle accidents", "environmental accidents", and "road condition accidents". Then, based on the user's comments on traffic accident videos on social media, four main cause lexicons were established, including "human factors", "vehicle factors", "environmental factors", and "road condition factors". Finally, for each video, all the comments are combined to calculate the four feature values of "human factor count", "vehicle factor count", "environmental factor count", and "road condition factor count", in this way to establish training and prediction dataset. Finally, this study uses four algorithms, including random forest algorithm, decision tree algorithm, Bayesian classification algorithm, and support vector machine algorithm for classification. The accuracy rate obtained by random forest algorithm is the highest, as high as 95.4545% , use and combine the traffic accident lexicon to effectively identify the main cause of the accident and compare the advantages and disadvantages of its algorithm.