ResEff-YOLO: Accuracy Enhancement of YOLOv8 through Integration of ResNet, SPPF, and EfficientHead Modules

Since computer vision has been widely applied in daily life, such as in autonomous vehicles, portable applications, augmented reality systems, and medical image analysis, the demand for architectures with lower complexity and higher accuracy has become a priority. To improve the accuracy and complexity of object detection, various methods have been developed, such as R-CNN and YOLO models. In this study, we propose an enhanced version of the YOLOv8 model to further improve detection accuracy and efficiency. Specifically, we adopted EfficientHead as the detection head, which optimizes computational resource utilization and improves inference speed while maintaining detection accuracy. For the backbone network, we incorporated the ResNet18d module along with the SPPF_LSKA module, which enhances the network's ability to learn multi-scale features, surpassing traditional convolutional layers. The deep stem structure of ResNet18d helps retain more spatial information, while SPPF_LSKA introduces Large Separable Kernel Attention (LSKA) to enhance the SPPF feature extractor, improving multi-scale feature extraction and handling of complex scenes. Experiments on the VOC dataset demonstrate that the ResEff-YOLO model outperforms the YOLOv8 series, with a mean average precision (mAP) improvement of approximately 4% and an mAP50-95 improvement of 4.2%.

關鍵字

Object Detection ； YOLOv8 ； Multi-Scale Feature Extraction

參考文獻

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

Ren, S. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497.

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

延伸閱讀

全文下載

主題瀏覽