基于差分隐私下包外估计的随机森林算法
CSTR:
作者:
作者单位:

(1.武汉理工大学 计算机科学与技术学院,武汉 430063; 2.武汉理工大学 能源与动力工程学院,武汉 430063)

作者简介:

李玉强(1977—),男,副教授,硕士生导师

通讯作者:

陈鋆昊,472769019@qq.com

中图分类号:

TP391

基金项目:


Random forest algorithm under differential privacy based on out-of-bag estimate
Author:
Affiliation:

(1.School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China; 2.School of Energy and Power Engineering, Wuhan University of Technology, Wuhan 430063, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对差分隐私随机森林算法在对高维数据进行分类时准确率不理想的问题,本文通过引入差分隐私下的包外估计来计算决策树权重以及特征权重,从而提出一种基于差分隐私下包外估计的随机森林算法(random forest under differential privacy based on the out-of-bag estimate, RFDP_OOB).本算法首先在差分隐私保护下生成一部分的随机森林,利用差分隐私下包外估计的特性对决策树和特征的重要性进行评估,从而计算出决策树权重以及特征权重,然后通过特征权重对特征进行划分,得到非重要特征集.接着在生成剩下的一部分随机森林的过程中,对最佳特征为非重要特征的结点进行预剪枝操作,使其成为叶子结点,从而减小噪声、提高决策树分类准确率,并具有较好的执行效率.最后在预测分类结果时,取所对应的决策树权重最大的分类结果作为随机森林算法的分类结果,从而提高随机森林算法的分类准确率.本文还对算法的有效性和隐私性进行了理论分析,并通过实验结果验证了本算法的有效性,本算法可以在保护数据隐私性的同时提高算法的分类准确率.

    Abstract:

    Since the accuracy of random forest algorithm under differential privacy is undesirable when classifying high-dimensional data, the out-of-bag estimate was introduced to calculate the weights of decision trees and features, and the random forest algorithm under differential privacy based on the out-of-bag estimate (RFDP_OOB) was proposed. First, the algorithm generates a part of random forest under differential privacy, and the importance of decision trees and features is evaluated by utilizing the out-of-bag estimate under differential privacy, so as to calculate the weights of the decision trees and features. Then, the features are re-divided into non-essential features through feature weights. Next, in the process of generating the remaining part of the random forest, the pre-pruning operation is performed on the nodes whose best features are non-important features to make them leaf nodes, so as to reduce noise and improve the classification accuracy of the decision tree with better efficiency. Finally, in predicting the classification results, the classification result with the maximum weight of the corresponding decision tree is taken as the classification result of the random forest algorithm, thereby improving the classification accuracy of the random forest algorithm. The privacy and effectiveness of the algorithm were analyzed theoretically, and the experimental results verified the effectiveness of the algorithm. The proposed algorithm can improve the classification accuracy and protect the privacy of data.

    参考文献
    相似文献
    引证文献
引用本文

李玉强,陈鋆昊,李琦,刘爱华.基于差分隐私下包外估计的随机森林算法[J].哈尔滨工业大学学报,2021,53(2):146. DOI:10.11918/201912140

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-12-26
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-01-29
  • 出版日期:
文章二维码