透過監視器影像之人體幾何資訊萃取與行動分析

基於統計顯示，監視器數量隨著年代開始成倍數成長，全球已超過1億支監視器被安裝與使用，然而影像紀錄內容仍需要仰賴人力介入辨識，才能夠達到監測之目的。本研究藉由單影像監視器進行室內影像的收集，並運用目前精準且快速的深度學習模型，如Yolov4、Openpose協助快速抓取行人影像。基於攝影測量學理建立嚴謹物像關係，以及套用最小二乘法推算行人於物空間坐標位置和精度指標作為後續評估分析。最後所萃取行人行走時多樣性指標，如行走的頻率、速度以及行人的幾何資訊來建立行人特有的特徵向量，並分析跨監視器影像之特徵向量相似性，藉以實現跨監視器之行人追蹤。研究成果論證應用深度學習能自動準確偵測欲追蹤影像目標資訊，每幀影像耗費1~1.5秒辨識行人於影像中位置。萃取行走指標中行人幾何資訊，結合誤差傳播模型所得精度指標做加權分析來提高成果的可靠度。本研究結果獲取誤差落在±1公分幾何資訊，並利用單相中影像變化，萃取該行人於場景中行走頻率，成功辨識跑步的行人落在1.91Hz頻率，與慢走的行人使用0.92Hz進行慢走。藉由多樣化的行人特徵，強化跨影像追蹤之可靠度。未來進行室內場域管理，能夠基於本研究實現行人資訊的萃取與追蹤。延伸可應用於警方查緝犯人逃跑軌跡，或是即時監測場域內行人意外等即時探測，提升監視器影像於空間管理應用與價值。

關鍵字

影像三維重建；深度學習；人體幾何資訊萃取；行人姿態分析；智慧化監視器

並列摘要

According to statistics, there are around 1 billion surveillance have been installed and used. However, the content of video still needs to be identified by human intervention in order to achieve the purpose of monitoring. This research will collect indoor surveillance’s image. The current accurate and fast deep learning models, such as Yolov4, Openpose, etc., are used to quickly capture human in the image. A rigorous object-image relationship constructs based on the collinear of photogrammetry and least squares method. Then calculate the geometric information of the pedestrian. And it will extract more information from walking human, such as the frequency of walking and the speed of walking. Finally, the unique feature vector of the pedestrian is established, and applied to achieve cross-image pedestrian trajectory tracking. From the research results, it applies deep learning models to automatically and accurately capture image information. And calculate the spatial information of pedestrians by rigorous relationship. Moreover, the pedestrian geometric information could be combined with the accuracy indicators. These indicators obtained through the error propagation model for weighted analysis, and provided to improve the reliability of the results. The results of this study obtained the geometric information with an error of ±1 cm. Then extract the walking frequency of the pedestrian in the scene. It successfully identified running human using a frequency of 1.91Hz and walking slowly using a frequency of 0.92Hz. The reliability of cross-image tracking is going to enhance with diverse pedestrian features.

並列關鍵字

3D Reconstruct ； Deep Learning ； Human geometry information ； Human pose analysis ； Intellectual surveillance

參考文獻

Bochkovskiy, A., Wang, C. Y., Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.

Google Scholar

Bureau of police research and development. (2019). Data of police organizations. India. VSK Kaumudl.

Google Scholar

Cao, Z., Simon, T., Wei, S. E., Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299).

Google Scholar

Criminisi, A., Reid, I., Zisserman, A. (2000). Single view metrology. International Journal of Computer Vision, 40(2), 123-148.

Google Scholar

He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

Google Scholar

國際替代計量

透過監視器影像之人體幾何資訊萃取與行動分析

全文下載

主題瀏覽