Abstract:To address the problems of insufficient real-time performance, high computational resource consumption, and low detection accuracy in practical environments for substation defect detection tasks, this paper proposed a lightweight object detection model, namely ELT-RTDETR. First, EfficientFormerV2 was adopted as the backbone network, combining local convolutions with a lightweight Transformer design to significantly reduce the number of model parameters and computational overhead. Second, a lightweight multi-scale feature pyramid network (LMSFPN) was proposed to enhance the expression capability of multi-scale defect features through multi-scale depth-wise convolutions, weighted fusion, and efficient upsampling strategies, while reducing redundant computations. Finally, a token statistics self-attention cross-feature enhancement (TSSACFE) module based on the token statistics self-attention (TSSA) mechanism was introduced. This module optimized feature interaction through local statistical modeling and low-dimensional projection, effectively improving the detection robustness of small defects. Results show that on a self-built substation equipment defect dataset, the detection accuracy of ELT-RTDETR reaches 82.1%, which is 7.3% higher than that of the traditional RT-DETR. Meanwhile, the model calculation volume and parameter count are reduced by 63.2% and 50.7%, respectively. Ablation experiments and comparisons with mainstream algorithms demonstrate that the proposed model outperforms the YOLO series and existing RT-DETR variants in terms of accuracy, light weight, and inference efficiency, especially in tasks such as meter shell damage and silica gel canister discoloration. This study provides an efficient solution for real-time defect detection in substation environments, possessing significant potential for engineering applications.