Abstract:The transient stability of power systems is crucial for ensuring secure grid operation and continuous power supply, and accurately identifying the transient dominant instability mode (DIM) of power systems is key to formulating effective emergency control strategies. To address the problem of imbalanced distribution of power system transient characteristic data, this paper proposed a two-stage decoupling learning and multi-teacher distillation (TSDM) framework. This framework employed a two-stage decoupling training strategy to achieve the collaborative optimization of representation learning and classifier training. First, instance sampling was used to train multiple teacher models to learn the global feature distribution of the power system transient characteristic data. Second, class-balanced sampling was adopted to train the student model, which transferred high-order feature representations from the teacher models through feature distillation rather than directly reusing their classifier weights, thereby mitigating the problem of bias propagation. Simultaneously, normalization was applied to the feature vectors and classifier weights, respectively, effectively eliminating the prediction biases caused by differences in feature scales. Finally, a separable Transformer module served as the backbone network; through a parameter sharing mechanism and attention optimization design, this module could accurately capture the spatiotemporal correlation features of long time sequences, ensuring that the feature extraction performance was not affected by sequence length. Simulation results based on the CEPRI-36 node system case show that the proposed method achieves a classification accuracy of 98.61% in the recognition of DIM of power systems, particularly demonstrating a significant advantage in the recognition rate of minority class samples, and it provides an effective solution for power system transient stability analysis.