近年來,監控攝影機已經隨處可見,但傳統的監控系統受限於固定的位置、影像解析度以及安裝條件,難以利用立體影像進行三維物體坐標的檢測。本研究針對此問題,採用單一攝影機捕捉的動態影像序列,結合深度學習模型Yolov8-pose進行大規模影像數據處理,於兩個室內環境中進行驗證行人自動檢測及追蹤,並通過攝影測量的共線性方程計算三維幾何信息。結果顯示,在使用R5 5600X CPU及3070顯示卡的條件下,每幀影像處理的時間大約為1秒,行人身高誤差控制在±3公分以內,相機外方位參數pitch值誤差約為1度。此外,研究加入低頭角度偵測和上衣顏色的過濾條件,RMSE由15.8mm提升至13.3mm,提升系統定位精度。本研究方法顯著降低了計算時間和人力成本,為即時行人定位與追蹤提供了一種高效且低成本的替代方案,適用於室內監控場景並增加行人追蹤特徵指標,可延伸運用於提升室內場域管理效率及縮短行人意外告警時間。
In recent years, surveillance cameras have become ubiquitous, yet surveillance systems are typically fixed in specific positions and are constrained by the image resolution and installation location, making it difficult to utilize stereo images for three-dimensional object coordinate detection. This study was validated in two indoor environments, utilizing dynamic sequence images captured by a single camera. By combining YOLOv8-pose deep learning model for processing large volumes of image data, the system automatically detects and tracks pedestrians, and calculates three-dimensional geometric information based on photogrammetric collinearity equations. The results indicate that using an R5 5600X CPU and a 3070 GPU, the processing time per image frame was approximately 1 second, with pedestrian height errors controlled within ±3 cm and the pitch error of the camera's exterior orientation parameters being around 1 degree. Additionally, this study incorporates head-down angle detection and clothing color filtering, improving the root mean square error from 15.8mm to 13.3mm, thereby enhancing the system's localization accuracy. This approach significantly reduces computational time and labor costs, providing an efficient and low-cost solution for real-time pedestrian localization and tracking. It is particularly suitable for indoor surveillance scenarios, enabling improved indoor management efficiency, reduced pedestrian accident alert times, and enhanced pedestrian tracking feature indicators for practical applications.