Wearing safety helmets and safety harnesses when working at heights on construction sites is an effective means to protect workers from accidents. An improved YOLOv5 detection method for safety harnesses and helmets is proposed to improve the detection performance of small objects in complex backgrounds. Firstly, a mixed attention mechanism is introduced to design the backbone network, which can effectively suppress the negative impact of complex backgrounds and improve the model detection performance. Secondly, a cross-layer complementary feature fusion network is constructed to strengthen the fusion between high and low-level features. It can improve the model's ability to adapt to small and medium-sized objects. Finally, DIoU-NMS is used to reduce the over-suppression of bounding boxes caused by the proximity of objects. A large number of experiments are carried out on a self-built dataset. The results show that the mean Average Precision (mAP) of the proposed method reaches 94.5%, which is better than that of mainstream detection methods, and the frame rate per second is 48.