在現行的監視錄影系統中,往往只能在意外發生之後再回放檢視。若要能夠及時的得知意外或犯罪行為的發生,需要耗費大量的人力成本監控畫面,效率也很低落。此篇論文旨在以自動化的方式及時偵測影片中的異常事件例如搶劫、虐待或是暴力等犯罪行為,阻止更多遺憾的發生。然而這些資料非常稀少且難以蒐集,要人工標註每個影格的類別更需要高昂的成本。因此本文提出弱監督學習的模型,利用只標註是否含有異常事件的影片,訓練模型找到其中實際包含異常行為的關鍵影格。 本文所提出的預測模型包含:(1)時序注意力機制,讓模型透過預測影片是否包含異常行為以反推時間序列上哪些是值得關注的事件,(2)分群模型,利用畫面本身的特徵做分類,劃分出正常影格和異常影格,(3)亂度平滑損失函數,使用此函數進行訓練可使得預測結果具有時間上的一致性,讓預測更加合理。 透過實驗在UCF-Crime和ShanghaiTech兩個不同規模與類型的資料集,本文所提出的模型展現了非常有競爭力的效能,其中在UCF-Crime更達到了與目前現有其他方法相比最先進的結果。
In the current surveillance video system, to detect the occurrence of accidents or crimes in time, the labor cost of monitoring screens is expensive, and the efficiency is very low. Also, such data are rare and hard to collect. Manually labeling the frames costs a lot as well. Therefore, this paper proposed a weakly supervised learning model, which can be trained by video-level ground-truth that only labeled whether the video contains abnormal events, and finds the keyframes that contain abnormal behaviors. Our proposed model includes (1) Temporal attention module that helps the model detect the key instances. (2) Cluster module that divides the video by segment features. (3) Entropy smoothness loss that helps to stabilize the predict curve. The experiment is implemented on UCF-crime and ShanghaiTech datasets. Remarkably, our model achieved a state-of-the-art result on UCF-Crime dataset (AUC 84.75\%).