透過您的圖書館登入
IP:52.91.255.225
  • 學位論文

工地行人之單像攝影三維定位與標記影像自動化識別與二維定位

3D Positioning of Pedestrians in Construction Sites and Automatic Recognition and 2D positioning of surveying marks

指導教授 : 葉怡成

摘要


隨著深度學習的快速發展,行人識別技術已經相當成熟。另一方面,攝影測量也已經相當成熟,利用雙像定位,可以對影像中的共同點進行3D定位。因此如果能結合行人識別技術與攝影測量,就可能對工地中的工人進行自動化的3D定位。這種含3D座標的行人識別技術可用來管理與監控施工現場,對於提高生產效率和現場安全具有重大價值。然而目前這方面的文獻很少,主要的原因是利用雙像定位必須先找出兩張影像中的共同點,也就是必須對左右雙像中的工人作匹配,因此不只要識別行人,還要識別其身分,因此相當困難。為了免除必須對左右雙像中的工人作匹配的困擾,本文採用單像進行3D定位。本研究的主要目的有二:(1)單像攝影測量進行物方座標的三維定位。(2) 識別工地現場用來做為攝影測量後方交會法已知點的標記,並對其進行像平面座標二維定位。採用的研究方法如下: (1)利用附加條件,例如行人站立點的高程,將雙像定位轉化為單像定位。(2)利用深度學習實現已知點標記的自動識別和二維定位。研究結果顯示 (1)傳統的雙像攝影測量、單像之高程已知法、身高已知法、距離已知法的行人定位誤差平均值分別為0.28 m、0.45 m、0.24 m、0.14 m。單像法可以達到雙像法的精度。(2) 敏感性分析顯示,單像之高程已知法、身高已知法、距離已知法的假設高程、身高、距離各有10公分的誤差大約會造成20公分的3D定位誤差。(3) 深度學習的結果顯示,三種標記識別模型的識別精準度、召回率、mAP、二維定位誤差(像素)分別為單標示單分類法(94%、 37%、 45%,、 6.99)、多標示單分類法(84%、63%、 55%、 3.44)、多標示多分類法(97%、 64%、 58%、3.76)。結果表明,深度學習可以精確識別與定位標記。

並列摘要


With the rapid development of deep learning, pedestrian recognition technology has become quite mature. On the other hand, photogrammetry is also quite mature, and using dual image positioning, 3D positioning of common points in images can be performed. Therefore, if pedestrian recognition technology and photogrammetry can be combined, it is possible to automate 3D positioning of workers in construction sites. This pedestrian recognition technology with 3D coordinates can be used to manage and monitor construction sites, which is of great value to improve productivity and site safety. However, there is little literature on this topic, mainly because it is difficult to identify not only pedestrians but also their identities using dual-image localization because the common points in both images must be identified first, i.e., workers in both left and right images must be matched. In order to avoid the trouble of matching the workers in both left and right images, this paper uses a single image for 3D localization. The main objectives of this study are twofold, including (1) to perform 3D localization of pedestrian by single-image photogrammetry. (2) To identify the markers at the site used as the known points of the intersection method of photogrammetry, and to perform 2D positioning of the image plane coordinates of markers. The following research methods are used: (1) Converting dual image localization to single image localization using additional conditions, such as the elevation of the pedestrian standing point. (2) Automatic recognition and 2D localization of known point markers using deep learning. The results of the study showed that (1) the average error of pedestrian positioning for the conventional dual-image photogrammetry, the three single-image method, known elevation method, known height method, and the known distance method were 0.28 m, 0.45 m, 0.24 m, and 0.14 m. Therefoore, the single-image method could achieve the accuracy of the dual-image method. (2) Sensitivity analysis shows that a 10 cm error in elevation, height and distance for the single-image method will result in a 3D positioning error of about 20 cm. (3) The results of deep learning show that the recognition accuracy, recall, mAP, and 2D localization error (in pixels) of the three marker recognition models are 94%, 37%, 45%, 6.99 for the single-label single-classification method, 84%, 63%, 55%, 3.44 for the multi-label single-classification method, and 97%, 64%, 58%, 3.76 for the multi-label multi-classification method. The results showed that deep learning can accurately identify and localize markers. Key words: construction site, photogrammetry, pedestrian localization, deep learning, image recognition, 2D positioning of surveying marks.

參考文獻


1.Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
2.Redmon, J., & Farhadi, A. (2016). Better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6517-6525).
3.Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
4.Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
5.Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.

延伸閱讀