基于反向遗忘的后门毒化样本检测
CSTR:
作者:
作者单位:

(南京信息工程大学 计算机学院、网络空间安全学院,南京 210044)

作者简介:

闫雷鸣(1973—),男,副教授,硕士生导师

通讯作者:

闫雷鸣,yan_leiming@163.com

中图分类号:

TP391

基金项目:

国家自然科学基金(2,7)


Backdoor poisoned sample detection via reverse forgetting
Author:
Affiliation:

(Nanjing University of Information Science and Technology, School of Computer Science, School of Cyber Science and Engineering, Nanjing 210044, China)

Fund Project:

undefined

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为提升模型性能,深度神经网络常需引入不可信数据集,导致易受数据投毒后门攻击。传统检测方法依赖识别毒化与良性样本的特征差异,但当攻击者优化触发器以模糊此边界时,其效果受限。针对此问题,本文提出反向遗忘(reverse forgeting,RFgt)检测方法,利用后门攻击中 “毒化样本占比低”的特性,采用逆向优化策略:强制中毒模型快速遗忘占多数的良性样本特征,同时保留并强化对可疑样本的学习,以巩固其毒化特征,显著放大两类样本的特征差异,最终通过样本预测熵值判定是否为毒化样本。研究表明:RFgt在CIFAR-10和GTSRB数据集上能够有效检测多种后门攻击下的毒化样本,同时对良性样本保持较低的误检率;在Tiny ImageNet数据集上的检测结果证明本方法具备良好的泛化能力。针对4种经典的数据投毒攻击,本方法平均检测真阳率达到99.28%,假阳率仅为0.06%,其综合性能优于现有防御方法。

    Abstract:

    To enhance model performance, Deep Neural Networks are frequently trained on untrusted datasets, rendering them vulnerable to data poisoning backdoor attacks. Conventional detection methods rely on identifying feature discrepancies between poisoned and benign samples. However, their effectiveness diminishes when attackers optimize trigger generation to obscure this boundary. To address this issue, this paper proposes a novel detection method named reverse forgeting (RFgt). The method exploits the characteristic of backdoor attacks, where the proportion of poisoned samples is low, and employs a reverse optimization strategy. Instead of forcing a poisoned model to forget backdoor features, RFgt compels it to rapidly forget the features of the majority class (benign samples), while simultaneously retaining and reinforcing the learning of suspicious samples to consolidate their poisoned features. This approach significantly amplifies the feature disparity between the two sample types. Ultimately, the prediction entropy of the samples is used to determine whether they are poisoned or benign. Experimental results demonstrate that RFgt effectively detects poisoned samples under various backdoor attacks on the CIFAR-10 and GTSRB datasets, while maintaining a low false positive rate. Furthermore, this method demonstrates strong generalization capability, as shown by its performance on the Tiny ImageNet dataset. Specifically, against four classic data poisoning attacks, RFgt achieves an average True Positive Rate (TPR) of 99.28% and a False Positive Rate (FPR) of only 0.06%, outperforming existing defense methods in overall performance.

    参考文献
    相似文献
    引证文献
引用本文

闫雷鸣,尤剑飞.基于反向遗忘的后门毒化样本检测[J].哈尔滨工业大学学报,2026,58(5):116. DOI:10.11918/202507065

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-07-26
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-05-28
  • 出版日期:
文章二维码