透過您的圖書館登入
IP:18.223.97.137
  • 學位論文

基於深度學習以及幾何資訊進行單相機影像目標偵測與空間定位追蹤

Object Detection and Location Tracking From Single-Camera Images Based on Deep Learning and Geometric Information

指導教授 : 韓仁毓
本文將於2026/01/14開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


近年來,監控攝影機已經隨處可見,但傳統的監控系統受限於固定的位置、影像解析度以及安裝條件,難以利用立體影像進行三維物體坐標的檢測。本研究針對此問題,採用單一攝影機捕捉的動態影像序列,結合深度學習模型Yolov8-pose進行大規模影像數據處理,於兩個室內環境中進行驗證行人自動檢測及追蹤,並通過攝影測量的共線性方程計算三維幾何信息。結果顯示,在使用R5 5600X CPU及3070顯示卡的條件下,每幀影像處理的時間大約為1秒,行人身高誤差控制在±3公分以內,相機外方位參數pitch值誤差約為1度。此外,研究加入低頭角度偵測和上衣顏色的過濾條件,RMSE由15.8mm提升至13.3mm,提升系統定位精度。本研究方法顯著降低了計算時間和人力成本,為即時行人定位與追蹤提供了一種高效且低成本的替代方案,適用於室內監控場景並增加行人追蹤特徵指標,可延伸運用於提升室內場域管理效率及縮短行人意外告警時間。

並列摘要


In recent years, surveillance cameras have become ubiquitous, yet surveillance systems are typically fixed in specific positions and are constrained by the image resolution and installation location, making it difficult to utilize stereo images for three-dimensional object coordinate detection. This study was validated in two indoor environments, utilizing dynamic sequence images captured by a single camera. By combining YOLOv8-pose deep learning model for processing large volumes of image data, the system automatically detects and tracks pedestrians, and calculates three-dimensional geometric information based on photogrammetric collinearity equations. The results indicate that using an R5 5600X CPU and a 3070 GPU, the processing time per image frame was approximately 1 second, with pedestrian height errors controlled within ±3 cm and the pitch error of the camera's exterior orientation parameters being around 1 degree. Additionally, this study incorporates head-down angle detection and clothing color filtering, improving the root mean square error from 15.8mm to 13.3mm, thereby enhancing the system's localization accuracy. This approach significantly reduces computational time and labor costs, providing an efficient and low-cost solution for real-time pedestrian localization and tracking. It is particularly suitable for indoor surveillance scenarios, enabling improved indoor management efficiency, reduced pedestrian accident alert times, and enhanced pedestrian tracking feature indicators for practical applications.

參考文獻


AFFES, N., KTARI, J., BEN AMOR, N., FRIKHA, T., and HAMAM, H., (2023). Comparison of YOLOV5, YOLOV6, YOLOV7 and YOLOV8 for Intelligent Video Surveillance. Journal of Information Assurance and Security,18(5).
Agrawal, T., Kirkpatrick, C., Imran, K., Figus, M.,(2020). Automatically Detecting Personal Protective Equipment on Persons in Images Using Amazon Rekognition. Amazon.
Alhassan Gamani, A. R., Arhin, I., and Kyeremateng Asamoah, A., (2024). Performance Evaluation of YOLOv8 Model Configurations, for Instance Segmentation of Strawberry Fruit Development Stages in an Open Field Environment. arXiv e-prints, arXiv-2408.
Akcay, S.,Kundegorski, M.E., Willcocks, C.G., Breckon, T.P., (2018). Using deep convolutional neural network architectures for object classification and detection within X-ray baggage security imagery. IEEE Trans. Inf. Forensic Secur. 13, 2203–2215.
Bodla, N., Singh, B., Chellappa, R., and Davis, L. S. (2017)., Soft-NMS: Improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 5561-5569.

延伸閱讀