透過您的圖書館登入
IP:3.136.97.64
  • 學位論文

卷積神經網絡的架構與資料擴充技術對工地人員姿勢識別之影響

The influence of the architecture of convolutional neural networks and data augmentation on the posture recognition of workers on construction site

指導教授 : 葉怡成

摘要


無論是公共工程或是大型建築,透過行人檢測觀察工地行人的行為,能獲得很多重要資訊,從而有效提升施工效率或安全。但由於施工現場環境複雜,常常出現遮掩物、光暗變化以及行人姿勢變化,傳統的機器學習對此不能有效檢測。此外,過去很少探討識別行人姿勢種類的文獻。因此,本研究以YOLOv4深度學習演算法來識別三種姿勢的工地行人 (站姿、彎腰及蹲姿),並透過優化卷積神經網絡參數及架構,以提高行人檢測的準確性。優化參數及架構包括 (1) 五種資料擴充技術(Data augmentation)、(2) 激活函數(Activation function),(3) 調整遷移學習(Transfer learning)分界點 (4) 學習速率,及 (5) 最大權重更新次數,一共九個因子。並利用二水準部分因子實驗設計,有效、合理地安排出16組實驗數據,最後透過效果分析出一組最佳因子水準組合。效果分析顯示除了以下因子,其餘因子並不顯著。(1) 拼貼法 (2) 傅立葉混合 (3) 遷移學習分界點 (4) 權重更新次數。最佳因子水準組合在80張工地圖像(325個工人)中,精準度、召回率、mAP分別為67.0%, 85.0%, 83.7%。平均每張圖像處理速度0.038 sec。結果表明,通過優化卷積神經網絡參數及架構,可以提高辨識各種姿勢的工地行人的準確性。

並列摘要


Whether it is an infrastructure or a large construction project, observing the construction workers on site through pedestrian detection can provide a lot of important information that can effectively improve construction efficiency or safety. However, due to the complexity of the construction site environment, there are often occlusions, changes in light and darkness, and changes in pedestrian posture that cannot be effectively detected by traditional machine learning. Moreover, there is little literature on the identification of pedestrian posture types. Therefore, this study used the YOLOv4 deep learning algorithm to identify three types of postures (standing, bending, and crouching) of site occupants and optimized the parameters and architecture of the convolutional neural network to improve the accuracy of pedestrian posture detection. The optimized parameters and architecture include (1) five data augmentation techniques, (2) activation function, (3) transfer learning, (4) learning rate, and (5) maximum number of weight updates, with a total of nine factors. And two-level partial factorial experimental design was used to efficiently and rationally arrange 16 sets of experiments, and the optimal combination of factor levels was identified through the effect analysis. It is revealed that the other factors were not significant, except for the following four factors, including two data augmentation techniques, Mosaic and Fourier mixture, and learning rate, and maximum number of weight updates. Based on the 80 construction site images (325 workers), the optimal factor level combination obtained 67.0% of accuracy, 85.0% of recall, and 83.7% of mAP (Mean Average Precision). The average processing speed per image was 0.038 sec. The results showed that optimizing the parameters and architecture of the convolutional neural network can improve the accuracy of identifying site worker posture.

參考文獻


[1] Son, H., Sung, H., Choi, H., Lee, S., & Kim, C. (2017). Detection of nearby obstacles with monocular vision for earthmoving operations. In ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction (Vol. 34). IAARC Publications.
[2] Yang, J., Park, M. W., Vela, P. A., & Golparvar-Fard, M. (2015). Construction performance monitoring via still images, time-lapse photos, and video streams: Now, tomorrow, and the future. Advanced Engineering Informatics, 29(2), 211-224.
[3] Fang, Y., Chen, J., Cho, Y. K., & Zhang, P. (2016, January). A point cloud-vision hybrid approach for 3D location tracking of mobile construction assets. In 33rd International Symposium on Automation and Robotics in Construction (ISARC 2016). Proceedings of the International Symposiumon Automation and Robotics in Construction (Vol. 33, pp. 1-7).
[4] Alwasel, A., Sabet, A., Nahangi, M., Haas, C. T., & Abdel-Rahman, E. (2017). Identifying poses of safe and productive masons using machine learning. Automation in Construction, 84, 345-355.
[5] Chen, J., Fang, Y., & Cho, Y. K. (2018). Performance evaluation of 3D descriptors for object recognition in construction applications. Automation in Construction, 86, 44-52.

延伸閱讀