Backdoor poisoned sample detection via reverse forgetting

doi:10.11918/202507065

Home > Archive>Volume 58, Issue 5, 2026 >116-125. DOI:10.11918/202507065

Backdoor poisoned sample detection via reverse forgetting
DOI:
                        10.11918/202507065
                    
CSTR:
                        
Author:
                        
Affiliation:(Nanjing University of Information Science and Technology, School of Computer Science, School of Cyber Science and Engineering, Nanjing 210044, China)
Clc Number:TP391
Fund Project:undefined

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To enhance model performance, Deep Neural Networks are frequently trained on untrusted datasets, rendering them vulnerable to data poisoning backdoor attacks. Conventional detection methods rely on identifying feature discrepancies between poisoned and benign samples. However, their effectiveness diminishes when attackers optimize trigger generation to obscure this boundary. To address this issue, this paper proposes a novel detection method named reverse forgeting (RFgt). The method exploits the characteristic of backdoor attacks, where the proportion of poisoned samples is low, and employs a reverse optimization strategy. Instead of forcing a poisoned model to forget backdoor features, RFgt compels it to rapidly forget the features of the majority class (benign samples), while simultaneously retaining and reinforcing the learning of suspicious samples to consolidate their poisoned features. This approach significantly amplifies the feature disparity between the two sample types. Ultimately, the prediction entropy of the samples is used to determine whether they are poisoned or benign. Experimental results demonstrate that RFgt effectively detects poisoned samples under various backdoor attacks on the CIFAR-10 and GTSRB datasets, while maintaining a low false positive rate. Furthermore, this method demonstrates strong generalization capability, as shown by its performance on the Tiny ImageNet dataset. Specifically, against four classic data poisoning attacks, RFgt achieves an average True Positive Rate (TPR) of 99.28% and a False Positive Rate (FPR) of only 0.06%, outperforming existing defense methods in overall performance.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 26,2025
Revised:
Adopted:
Online: May 28,2026
Published:

Publication Statement

Journal Subscription

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code