基于反向遗忘的后门毒化样本检测

闫雷鸣; 尤剑飞

期刊检索

关键词检索

新闻公告MORE

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	闫雷鸣,尤剑飞.基于反向遗忘的后门毒化样本检测[J].哈尔滨工业大学学报,2026,58(5):116.DOI:10.11918/202507065
	YAN Leiming,YOU Jianfei.Backdoor poisoned sample detection via reverse forgetting[J].Journal of Harbin Institute of Technology,2026,58(5):116.DOI:10.11918/202507065

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 738次下载 22次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
基于反向遗忘的后门毒化样本检测
闫雷鸣,尤剑飞
(南京信息工程大学计算机学院、网络空间安全学院,南京 210044)

摘要:

为提升模型性能,深度神经网络常需引入不可信数据集,导致易受数据投毒后门攻击。传统检测方法依赖识别毒化与良性样本的特征差异,但当攻击者优化触发器以模糊此边界时,其效果受限。针对此问题,本文提出反向遗忘（reverse forgeting,RFgt）检测方法,利用后门攻击中 “毒化样本占比低”的特性,采用逆向优化策略:强制中毒模型快速遗忘占多数的良性样本特征,同时保留并强化对可疑样本的学习,以巩固其毒化特征,显著放大两类样本的特征差异,最终通过样本预测熵值判定是否为毒化样本。研究表明:RFgt在CIFAR-10和GTSRB数据集上能够有效检测多种后门攻击下的毒化样本,同时对良性样本保持较低的误检率;在Tiny ImageNet数据集上的检测结果证明本方法具备良好的泛化能力。针对4种经典的数据投毒攻击,本方法平均检测真阳率达到99.28%,假阳率仅为0.06%,其综合性能优于现有防御方法。

关键词: 后门攻击数据毒化样本检测遗忘学习预测熵

DOI：10.11918/202507065

分类号:TP391

文献标识码:A

基金项目:国家自然科学基金(2,7)

Backdoor poisoned sample detection via reverse forgetting

YAN Leiming,YOU Jianfei

(Nanjing University of Information Science and Technology, School of Computer Science, School of Cyber Science and Engineering, Nanjing 210044, China)

Abstract:

To enhance model performance, Deep Neural Networks are frequently trained on untrusted datasets, rendering them vulnerable to data poisoning backdoor attacks. Conventional detection methods rely on identifying feature discrepancies between poisoned and benign samples. However, their effectiveness diminishes when attackers optimize trigger generation to obscure this boundary. To address this issue, this paper proposes a novel detection method named reverse forgeting (RFgt). The method exploits the characteristic of backdoor attacks, where the proportion of poisoned samples is low, and employs a reverse optimization strategy. Instead of forcing a poisoned model to forget backdoor features, RFgt compels it to rapidly forget the features of the majority class (benign samples), while simultaneously retaining and reinforcing the learning of suspicious samples to consolidate their poisoned features. This approach significantly amplifies the feature disparity between the two sample types. Ultimately, the prediction entropy of the samples is used to determine whether they are poisoned or benign. Experimental results demonstrate that RFgt effectively detects poisoned samples under various backdoor attacks on the CIFAR-10 and GTSRB datasets, while maintaining a low false positive rate. Furthermore, this method demonstrates strong generalization capability, as shown by its performance on the Tiny ImageNet dataset. Specifically, against four classic data poisoning attacks, RFgt achieves an average True Positive Rate (TPR) of 99.28% and a False Positive Rate (FPR) of only 0.06%, outperforming existing defense methods in overall performance.

Key words: backdoor attack data poisoning sample detection forgetting learning predictive entropy

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS