| 引用本文: | 周玉,刘虹瑜,李京京,丁红强,白磊.局部密度最小不确定性的SVM样本选择算法[J].哈尔滨工业大学学报,2025,57(8):45.DOI:10.11918/202407085 |
| ZHOU Yu,LIU Hongyu,LI Jingjing,DING Hongqiang,BAI Lei.Sample selection algorithm for SVM with minimum uncertainty in local density[J].Journal of Harbin Institute of Technology,2025,57(8):45.DOI:10.11918/202407085 |
|
| 摘要: |
| 为解决支持向量机(SVM)在分类时通常含有大量的冗余样本,从而导致面对较大规模数据集时SVM计算复杂度受到限制的问题,提出一种局部密度最小不确定性的SVM样本选择算法。该方法对决策面影响较大的边界数据进行有效选择,通过提取可能含有支持向量的训练样本,降低计算开销,进而提高SVM性能。首先,计算训练样本的K互近邻个数与高斯核密度估计。其次,将K互近邻个数与高斯核密度估计进行加和得到每个样本点的K局部密度并获取密度矩阵。然后,利用局部密度不确定性平衡优化方法,将密度矩阵进行三值映射后使不确定性改变量达到最小时得到最优阈值,并划分密度矩阵为中心数据与边界数据。最后,提取边界数据并作为SVM的训练样本建立分类模型。结果表明:利用该方法在UCI数据集上与其他6种常用样本选择方法进行实验对比,以准确率、保存率作为性能指标,文中提出的算法可以迅速划分中心数据与边界数据并删除大量冗余的训练样本,有效降低SVM的训练负担的同时提高了分类性能。 |
| 关键词: 支持向量机(SVM) 样本选择 局部密度 不确定性平衡 分类 |
| DOI:10.11918/202407085 |
| 分类号:TP181 |
| 文献标识码:A |
| 基金项目:国家自然科学基金 (U2,0);河北省水利科技计划项目(2022-64) |
|
| Sample selection algorithm for SVM with minimum uncertainty in local density |
|
ZHOU Yu1,LIU Hongyu1,LI Jingjing2,DING Hongqiang2,BAI Lei1
|
|
(1.School of Electrical Engineering, North China University of Water Resource and Electric Power, Zhengzhou 450011, China; 2.Hebei Water Conservancy Engineering Bureau Group Limited, Shijiazhuang 050021, China)
|
| Abstract: |
| To address the issue that support vector machines (SVM) frequently encompass a considerable number of redundant samples during classification, which restricts the computational complexity of SVM when confronted with large-scale datasets, a SVM sample selection algorithm based on local density minimum uncertainty is put forward. This approach efficiently identifies influential boundary data points that significantly affect the decision boundary, subsequently reducing computational costs by isolating potential support vectors from the training set, thereby bolstering SVM’s overall effectiveness. Firstly, the number of K nearest neighbors and Gaussian kernel density estimation of the training samples are computed; Secondly, the sum of the number of K nearest neighbors and Gaussian kernel density estimation is derived for each sample point to acquire the K local density and obtain the density matrix; Subsequently, employing the local density uncertainty balancing optimization method, the density matrix undergoes a triple-mapping process to minimize uncertainty changes, yielding the optimal threshold. This threshold then partitions the density matrix into center data and boundary data. Finally, the boundary data are extracted and utilized as training samples for the SVM, enabling the establishment of an effective classification model. To experimentally evaluate the efficacy of our method, we compared it with six commonly utilized sample selection techniques on UCI datasets, employing accuracy and preservation rate as key performance metrics. The findings indicate that the method introduced in this paper significantly reduces the number of redundant training samples, thereby effectively alleviating the training burden on SVM and enhancing its classification performance. |
| Key words: support vector machine(SVM) sample selection local density balance of uncertainty classification |