局部密度最小不确定性的SVM样本选择算法

周玉; 刘虹瑜; 李京京; 丁红强; 白磊

期刊检索

关键词检索

新闻公告MORE

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	周玉,刘虹瑜,李京京,丁红强,白磊.局部密度最小不确定性的SVM样本选择算法[J].哈尔滨工业大学学报,2025,57(8):45.DOI:10.11918/202407085
	ZHOU Yu,LIU Hongyu,LI Jingjing,DING Hongqiang,BAI Lei.Sample selection algorithm for SVM with minimum uncertainty in local density[J].Journal of Harbin Institute of Technology,2025,57(8):45.DOI:10.11918/202407085

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1627次下载 423次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
局部密度最小不确定性的SVM样本选择算法
周玉¹,刘虹瑜¹,李京京²,丁红强²,白磊¹
(1.华北水利水电大学电气工程学院,郑州 450011; 2.河北省水利工程局集团有限公司,石家庄 050021)

摘要:

为解决支持向量机(SVM)在分类时通常含有大量的冗余样本,从而导致面对较大规模数据集时SVM计算复杂度受到限制的问题,提出一种局部密度最小不确定性的SVM样本选择算法。该方法对决策面影响较大的边界数据进行有效选择,通过提取可能含有支持向量的训练样本,降低计算开销,进而提高SVM性能。首先,计算训练样本的K互近邻个数与高斯核密度估计。其次,将K互近邻个数与高斯核密度估计进行加和得到每个样本点的K局部密度并获取密度矩阵。然后,利用局部密度不确定性平衡优化方法,将密度矩阵进行三值映射后使不确定性改变量达到最小时得到最优阈值,并划分密度矩阵为中心数据与边界数据。最后,提取边界数据并作为SVM的训练样本建立分类模型。结果表明:利用该方法在UCI数据集上与其他6种常用样本选择方法进行实验对比,以准确率、保存率作为性能指标,文中提出的算法可以迅速划分中心数据与边界数据并删除大量冗余的训练样本,有效降低SVM的训练负担的同时提高了分类性能。

关键词: 支持向量机(SVM) 样本选择局部密度不确定性平衡分类

DOI：10.11918/202407085

分类号:TP181

文献标识码:A

基金项目:国家自然科学基金 (U2,0)；河北省水利科技计划项目(2022-64)

Sample selection algorithm for SVM with minimum uncertainty in local density

ZHOU Yu¹,LIU Hongyu¹,LI Jingjing²,DING Hongqiang²,BAI Lei¹

(1.School of Electrical Engineering, North China University of Water Resource and Electric Power, Zhengzhou 450011, China; 2.Hebei Water Conservancy Engineering Bureau Group Limited, Shijiazhuang 050021, China)

Abstract:

To address the issue that support vector machines (SVM) frequently encompass a considerable number of redundant samples during classification, which restricts the computational complexity of SVM when confronted with large-scale datasets, a SVM sample selection algorithm based on local density minimum uncertainty is put forward. This approach efficiently identifies influential boundary data points that significantly affect the decision boundary, subsequently reducing computational costs by isolating potential support vectors from the training set, thereby bolstering SVM’s overall effectiveness. Firstly, the number of K nearest neighbors and Gaussian kernel density estimation of the training samples are computed; Secondly, the sum of the number of K nearest neighbors and Gaussian kernel density estimation is derived for each sample point to acquire the K local density and obtain the density matrix; Subsequently, employing the local density uncertainty balancing optimization method, the density matrix undergoes a triple-mapping process to minimize uncertainty changes, yielding the optimal threshold. This threshold then partitions the density matrix into center data and boundary data. Finally, the boundary data are extracted and utilized as training samples for the SVM, enabling the establishment of an effective classification model. To experimentally evaluate the efficacy of our method, we compared it with six commonly utilized sample selection techniques on UCI datasets, employing accuracy and preservation rate as key performance metrics. The findings indicate that the method introduced in this paper significantly reduces the number of redundant training samples, thereby effectively alleviating the training burden on SVM and enhancing its classification performance.

Key words: support vector machine(SVM) sample selection local density balance of uncertainty classification

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS