Abstract:To address the issue that support vector machines (SVM) frequently encompass a considerable number of redundant samples during classification, which restricts the computational complexity of SVM when confronted with large-scale datasets, a SVM sample selection algorithm based on local density minimum uncertainty is put forward. This approach efficiently identifies influential boundary data points that significantly affect the decision boundary, subsequently reducing computational costs by isolating potential support vectors from the training set, thereby bolstering SVM’s overall effectiveness. Firstly, the number of K nearest neighbors and Gaussian kernel density estimation of the training samples are computed; Secondly, the sum of the number of K nearest neighbors and Gaussian kernel density estimation is derived for each sample point to acquire the K local density and obtain the density matrix; Subsequently, employing the local density uncertainty balancing optimization method, the density matrix undergoes a triple-mapping process to minimize uncertainty changes, yielding the optimal threshold. This threshold then partitions the density matrix into center data and boundary data. Finally, the boundary data are extracted and utilized as training samples for the SVM, enabling the establishment of an effective classification model. To experimentally evaluate the efficacy of our method, we compared it with six commonly utilized sample selection techniques on UCI datasets, employing accuracy and preservation rate as key performance metrics. The findings indicate that the method introduced in this paper significantly reduces the number of redundant training samples, thereby effectively alleviating the training burden on SVM and enhancing its classification performance.