期刊检索

  • 2026年第58卷
  • 2025年第57卷
  • 2024年第56卷
  • 2023年第55卷
  • 2022年第54卷
  • 2021年第53卷
  • 2020年第52卷
  • 2019年第51卷
  • 2018年第50卷
  • 2017年第49卷
  • 2016年第48卷
  • 2015年第47卷
  • 2014年第46卷
  • 2013年第45卷
  • 2012年第44卷
  • 2011年第43卷
  • 2010年第42卷
  • 第1期
  • 第2期

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学 主编 李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码
微信公众号二维码
引用本文:周玉,刘虹瑜,李京京,丁红强,白磊.局部密度最小不确定性的SVM样本选择算法[J].哈尔滨工业大学学报,2025,57(8):45.DOI:10.11918/202407085
ZHOU Yu,LIU Hongyu,LI Jingjing,DING Hongqiang,BAI Lei.Sample selection algorithm for SVM with minimum uncertainty in local density[J].Journal of Harbin Institute of Technology,2025,57(8):45.DOI:10.11918/202407085
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
过刊浏览    高级检索
本文已被:浏览 1627次   下载 423 本文二维码信息
码上扫一扫!
分享到: 微信 更多
局部密度最小不确定性的SVM样本选择算法
周玉1,刘虹瑜1,李京京2,丁红强2,白磊1
(1.华北水利水电大学 电气工程学院,郑州 450011; 2.河北省水利工程局集团有限公司,石家庄 050021)
摘要:
为解决支持向量机(SVM)在分类时通常含有大量的冗余样本,从而导致面对较大规模数据集时SVM计算复杂度受到限制的问题,提出一种局部密度最小不确定性的SVM样本选择算法。该方法对决策面影响较大的边界数据进行有效选择,通过提取可能含有支持向量的训练样本,降低计算开销,进而提高SVM性能。首先,计算训练样本的K互近邻个数与高斯核密度估计。其次,将K互近邻个数与高斯核密度估计进行加和得到每个样本点的K局部密度并获取密度矩阵。然后,利用局部密度不确定性平衡优化方法,将密度矩阵进行三值映射后使不确定性改变量达到最小时得到最优阈值,并划分密度矩阵为中心数据与边界数据。最后,提取边界数据并作为SVM的训练样本建立分类模型。结果表明:利用该方法在UCI数据集上与其他6种常用样本选择方法进行实验对比,以准确率、保存率作为性能指标,文中提出的算法可以迅速划分中心数据与边界数据并删除大量冗余的训练样本,有效降低SVM的训练负担的同时提高了分类性能。
关键词:  支持向量机(SVM)  样本选择  局部密度  不确定性平衡  分类
DOI:10.11918/202407085
分类号:TP181
文献标识码:A
基金项目:国家自然科学基金 (U2,0);河北省水利科技计划项目(2022-64)
Sample selection algorithm for SVM with minimum uncertainty in local density
ZHOU Yu1,LIU Hongyu1,LI Jingjing2,DING Hongqiang2,BAI Lei1
(1.School of Electrical Engineering, North China University of Water Resource and Electric Power, Zhengzhou 450011, China; 2.Hebei Water Conservancy Engineering Bureau Group Limited, Shijiazhuang 050021, China)
Abstract:
To address the issue that support vector machines (SVM) frequently encompass a considerable number of redundant samples during classification, which restricts the computational complexity of SVM when confronted with large-scale datasets, a SVM sample selection algorithm based on local density minimum uncertainty is put forward. This approach efficiently identifies influential boundary data points that significantly affect the decision boundary, subsequently reducing computational costs by isolating potential support vectors from the training set, thereby bolstering SVM’s overall effectiveness. Firstly, the number of K nearest neighbors and Gaussian kernel density estimation of the training samples are computed; Secondly, the sum of the number of K nearest neighbors and Gaussian kernel density estimation is derived for each sample point to acquire the K local density and obtain the density matrix; Subsequently, employing the local density uncertainty balancing optimization method, the density matrix undergoes a triple-mapping process to minimize uncertainty changes, yielding the optimal threshold. This threshold then partitions the density matrix into center data and boundary data. Finally, the boundary data are extracted and utilized as training samples for the SVM, enabling the establishment of an effective classification model. To experimentally evaluate the efficacy of our method, we compared it with six commonly utilized sample selection techniques on UCI datasets, employing accuracy and preservation rate as key performance metrics. The findings indicate that the method introduced in this paper significantly reduces the number of redundant training samples, thereby effectively alleviating the training burden on SVM and enhancing its classification performance.
Key words:  support vector machine(SVM)  sample selection  local density  balance of uncertainty  classification

友情链接LINKS