透過您的圖書館登入
IP:18.116.51.117
  • 學位論文

工地人員姿勢識別的資料擴充與卷積神經網絡架構之最佳化

Optimization of Data Augmentation and Convolutional Neural Network for Posture Recognition of Workers on Construction Sites

指導教授 : 葉怡成

摘要


近年來基礎建設的蓬勃發展,工地人員在施工現場的效率和安全受到重視,經由行人偵測觀察工地行人的行為,獲得許多重要資訊。行人偵測技術已經相當成熟,但姿勢的識別還很少研究,因此本研究使用YOLOv4深度學習演算法來識別三種姿勢的工地人員 (站姿、彎腰及蹲姿)。為了提高行人檢測的準確性,本文考慮五種資料擴充技術(Data augmentation)的使用,以及激活函數(Activation function)的類型、遷移學習(Transfer learning)的分界點、學習速率的大小、最大權重更新的次數,共九個因子,利用實驗設計(Design of Experiment, DOE)產生共 45 個實驗的兩階段實驗設計。採用平均精度 (Mean Average Precision, mAP) 評估這些訓練模型。使用自行收集的影像指定為訓練data,工地測試資料集指定為驗證data,以反映模型真實的識別能力。以mAP為因變數,上述9個因子為自變數,進行線性回歸分析發現,只有三個因子通過5%的顯著水準,包括遷移學習分界點、最大權重更新次數、激活函數。非線性迴歸分析發現,除了上述三個因子通過5%的顯著水準,還有兩個交互作用:遷移學習分界點*學習速率、遷移學習分界點*激活函數,以及一個因子(最大權重更新次數)曲線效果通過5%的顯著水準。此外有兩個資料擴充技術因子:旋轉角法、拼貼法通過10%的顯著水準。最佳設計的實證顯示,它的mAP=78%,並未超越原先所有45個實驗中的最佳結果(82%)。

並列摘要


In recent years, with the vigorous development of infrastructure construction, the efficiency and safety of construction site personnel at the construction site have been valued. Through pedestrian detection and observation of the behavior of pedestrians on the site, a lot of important information has been obtained. Pedestrian detection technology is quite mature, but posture recognition has been rarely studied. Therefore, this study uses the YOLOv4 deep learning algorithm to recognize three postures of construction workers, including standing posture, bending over and squatting posture. To improve the accuracy of pedestrian detection, this paper considers the use of five data augmentation technologies, the type of activation function, the boundary point of transfer learning, the learning rate, and the maximum batch of weight update, a total of nine factors, and uses the Design of Experiment (DOE) to produce a two-stage experimental design with 45 experiments. We used self-collected images as training data and the construction workers images collected from websites as validation data reflects the true recognition ability of the models. The Mean Average Precision (mAP) is employed to evaluate these trained models. The linear regression analysis on the mAP and the nine factors found that only three factors passed the 5% significance level, including the stop backward, the maximum of batches, and the activation function. The non-linear regression analysis found that in addition to the above three factors passing the 5% significance level, there were two interactions: maximum of batches * learning rate, maximum of batches * activation function, and one factor (maximum batch of weight update) curvilinear effect passing the 5% significance level. In addition, two factors of data augmentation technology: Angle method and Mosaic method pass the 10% significance level. The empirical evidence of the optimum design derived from the non-linear regression analysis model showed that it had a mAP=78%, which does not exceed the best result (82%) of all the original 45 experiments

參考文獻


Bochkovskiy, A., Wang, C. Y., Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
Cai, Y., Luan, T., Gao, H., Wang, H., Chen, L., Li, Y., ... Li, Z. (2021). YOLOv4-5D: An effective and efficient object detector for autonomous driving. IEEE Transactions on Instrumentation and Measurement, 70, 1-13.
Delhi, V. S. K., Sankarlal, R., Thomas, A. (2020). Detection of personal protective equipment (PPE) compliance on construction site using computer vision based deep learning techniques. Frontiers in Built Environment, 136.
Dong, S., Wang, P., Abbas, K. (2021). A survey on deep learning and its applications. Computer Science Review, 40, 100379.
Gao, Y., Nie, D., Gan, H., Cao, Z., Lin, N., Sun, C. (2021, August). A Novel Building Worker Detection based on Cross Feature Pyramid Network. In 2021 8th International Conference on Dependable Systems and Their Applications (DSA) (pp. 732-733). IEEE.

延伸閱讀