An improved RT‐DETR model is proposed to solve the challenges of complex texture details, diverse target scales and high computational efficiency in tea recognition tasks. The convolutional gated linear unit (CGLU) is introduced to achieve dynamic weight adjustment, which enhances the robustness to complex background and variable target. OmniKernel module is designed to integrate multi‐direction and multi‐scale convolution kernel to improve the modeling ability of tea texture directivity and scale diversity. Combined with frequency‐domain feature enhancement module (FSAM), the global context and local detail features are modeled jointly to suppress the interference of background noise. A feature segmentation and fusion module (SPDConv) is proposed to optimize the global consistency of features through space segmentation and channel fusion. Experiments show that the improved model, under the synergistic effect of enhanced input features and multi‐scale feature pyramid network, significantly improves the accuracy of tea recognition and adaptability to complex scenes, while maintaining high computational efficiency, providing a high‐precision and robust solution for tea detection tasks.