局部密度最小不确定性的SVM样本选择算法
CSTR:
作者:
作者单位:

(1.华北水利水电大学 电气工程学院,郑州 450011; 2.河北省水利工程局集团有限公司,石家庄 050021)

作者简介:

周玉(1979—),男,副教授,硕士生导师

通讯作者:

周玉,zhouyu_beijing@126.com

中图分类号:

TP181

基金项目:

国家自然科学基金 (U2,0);河北省水利科技计划项目(2022-64)


Sample selection algorithm for SVM with minimum uncertainty in local density
Author:
Affiliation:

(1.School of Electrical Engineering, North China University of Water Resource and Electric Power, Zhengzhou 450011, China; 2.Hebei Water Conservancy Engineering Bureau Group Limited, Shijiazhuang 050021, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为解决支持向量机(SVM)在分类时通常含有大量的冗余样本,从而导致面对较大规模数据集时SVM计算复杂度受到限制的问题,提出一种局部密度最小不确定性的SVM样本选择算法。该方法对决策面影响较大的边界数据进行有效选择,通过提取可能含有支持向量的训练样本,降低计算开销,进而提高SVM性能。首先,计算训练样本的K互近邻个数与高斯核密度估计。其次,将K互近邻个数与高斯核密度估计进行加和得到每个样本点的K局部密度并获取密度矩阵。然后,利用局部密度不确定性平衡优化方法,将密度矩阵进行三值映射后使不确定性改变量达到最小时得到最优阈值,并划分密度矩阵为中心数据与边界数据。最后,提取边界数据并作为SVM的训练样本建立分类模型。结果表明:利用该方法在UCI数据集上与其他6种常用样本选择方法进行实验对比,以准确率、保存率作为性能指标,文中提出的算法可以迅速划分中心数据与边界数据并删除大量冗余的训练样本,有效降低SVM的训练负担的同时提高了分类性能。

    Abstract:

    To address the issue that support vector machines (SVM) frequently encompass a considerable number of redundant samples during classification, which restricts the computational complexity of SVM when confronted with large-scale datasets, a SVM sample selection algorithm based on local density minimum uncertainty is put forward. This approach efficiently identifies influential boundary data points that significantly affect the decision boundary, subsequently reducing computational costs by isolating potential support vectors from the training set, thereby bolstering SVM’s overall effectiveness. Firstly, the number of K nearest neighbors and Gaussian kernel density estimation of the training samples are computed; Secondly, the sum of the number of K nearest neighbors and Gaussian kernel density estimation is derived for each sample point to acquire the K local density and obtain the density matrix; Subsequently, employing the local density uncertainty balancing optimization method, the density matrix undergoes a triple-mapping process to minimize uncertainty changes, yielding the optimal threshold. This threshold then partitions the density matrix into center data and boundary data. Finally, the boundary data are extracted and utilized as training samples for the SVM, enabling the establishment of an effective classification model. To experimentally evaluate the efficacy of our method, we compared it with six commonly utilized sample selection techniques on UCI datasets, employing accuracy and preservation rate as key performance metrics. The findings indicate that the method introduced in this paper significantly reduces the number of redundant training samples, thereby effectively alleviating the training burden on SVM and enhancing its classification performance.

    参考文献
    相似文献
    引证文献
引用本文

周玉,刘虹瑜,李京京,丁红强,白磊.局部密度最小不确定性的SVM样本选择算法[J].哈尔滨工业大学学报,2025,57(8):45. DOI:10.11918/202407085

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-30
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-08-11
  • 出版日期: 2025-08-10
文章二维码