Abstract:To investigate the robustness and trustworthiness of neural networks under security threats, this study focuses on their vulnerability to poisoning attacks. Based on a systematic analysis of the characteristics of type I and type II adversarial attacks, and in light of the structural deficiencies in neural network feature learning, the concept of type I poisoning attack is proposed. Theoretical modeling and analysis demonstrate fundamental feature-level distinctions between type I poisoning attacks and existing methods, such as “clean-label” or feature collision poisoning. A type I poisoned sample generation framework is built based on supervised variational autoencoders, and experiments on widely-used deep neural network architectures including ResNet50, VGG16, and MobileNetV2 are conducted. Results demonstrate that the proposed type I poisoning method effectively disrupts model classification decisions while preserving label consistency, successfully inducing misclassification across typical neural network architectures. Moreover, the defense experiments reveal that type I poisoning attacks can bypass existing mainstream defense mechanisms, rendering current primary countermeasures ineffective. With its strong stealth and disruptive capabilities, type I poisoning represents a novel security threat worthy of in-depth investigation. The development of this attack methodology holds significant implications for building more secure and robust neural network systems in the future.