Abstract
Many real-world machine learning applications face the challenge of dealing with changing data over time, known as concept drift, and the issue of data indeterminacy, where all the true labels available are unrealistic. This can lead to a decrease in the accuracy of the prediction models. The aim of this study is to introduce a new approach for detecting drift, which is based on neutrosophic set theory. This approach takes into account uncertainty in the prediction model and is able to handle indeterminate information, considering its impact on the model's performance. The proposed method reads data into windows and calculates a set of values based on the concept of neutrosophic membership. These values are then used in the Neutrosophic Support Vector Machine (N-SVM). To address the issue of indeterminate true label data, the values issued by N-SVM are expressed as entropy and used as input for the ADWIN (Adaptive Windowing) change detector. When a drift is detected, the prediction model is retrained by including only the most recent instances with the original training data set. The proposed method gives promising results in terms of drift detection accuracy compared to the state of existing drift detection methods such as KSWIN, ADWIN, and DWM.
Keywords
0 Introduction
Machine learning is becoming increasingly common in real-world applications. Many industries now use machine learning algorithms to process large amounts of continuously changing data. This is called concept drift. As described in the literature, this phenomenon can lead to poor prediction performance[1-2]. Although some existing learning algorithms can detect concept drift when all labels are available[3-5], it is important to note that in most cases it is unrealistic to access all labels[6]. To address these limitations, this study aims to develop a drift detection approach for situations where labels may be uncertain and the model considers the influence of current inputs on the attributes. Several methods have been used to define the concept drift.
One approach assumes that real labels can be used for retraining and drift detection. This process is known as active learning, where certain decisions are made to provide labels[7-9]. These labels are then used to update the model when concept drift is detected, ensuring that the model remains relevant and accurate over time. This approach is particularly useful in complex situations where the data distribution is highly susceptible to change. However, it may face challenges when there is not enough new data to support the update of the model after concept drift occurs, resulting in poor predictions.
Another approach to detecting concept drift is error-based detection, which uses real-time truth labels to identify anomalies and adjust training accordingly[10-11]. Some commonly used algorithms in this category include the Page-Hinckley test[12], ADWIN adaptive windowing[13], and control charts[14]. The Page-Hinckley test[12] is known for its simplicity and sensitivity in detecting drift. However, it assumes that changes in the mean always indicate drift, which may not be true in all data sets. Additionally, this method has limitations related to its assumptions and label dependency, which may require domain knowledge. The ADWIN adaptive windowing[13] has shown good accuracy in real-time detection, but its performance can be influenced by factors such as the rate at which concept drift occurs. It is also important to note that while ADWIN can detect changes in data distribution, it may not always be able to adapt perfectly to these changes, especially if they are complex.
Control charts[14] are a prediction error measure commonly used in continual learning to detect concept drift. The experiments used simulated streams of metadata to enable real-time change detection and integration. The results demonstrate that control charts serve as a reliable indicator for detecting concept drift. However, it is worth noting that the learning model may have a tendency to prioritize newly acquired knowledge over past knowledge, potentially leading to the forgetting of previously learned concepts.
Additionally, some methods can be used with error-based detection to monitor the classification error rate and detect drift, such as dynamically weighted majority DWM[15]. DWM has been shown to outperform other learners in terms of accuracy, particularly in its ability to incrementally learn concept descriptions, utilize previously encountered examples, and employ an unweighted, fixed-size ensemble of experts. This method is specifically designed for tracking concept drift. However, it should be noted that the performance of DWM may be affected by factors such as noise level and the sequence of concept drift.
Concept drift can also be detected by analyzing contextual information, which depends on the input and only requires actual labels only when deviations are detected. One method for detecting drift is to monitor the sample rate within the support vector machine solution limits[16-17]. Ayad[16] proposed an approach to recognize and handle concept changes using support vector machines. This approach integrates a mechanism that utilizes only recent and representative patterns to update the SVMs without causing forgetting. However, in a dynamic environment, data characteristics may evolve over time, leading to a significant decline in the performance of this approach. This is because the data used may no longer be consistent with the characteristics of the new incoming data. Nyati et al.[17] compared the accuracy of various classifiers with and without concept drift. When concept drift is present, the maximum accuracy is 90.4%. The authors noted that frequent occurrence of concept drift in a data stream significantly decreased the accuracy.
Another approach is to use the Kolmogorov-Smirnov test[18], which has also been successfully used to detect drift. The main contribution of the paper is the utilization of the Incremental Kolmogorov-Smirnov test to identify concept drifts without the need for true labels. The study demonstrates a strong level of accuracy, however, but it should be noted that the alpha parameter in KSWIN is highly sensitive and should be set below to 0.01. This indicates that the effectiveness of KSWIN can be greatly influenced by the selection of alpha, and may require precise calibration.
On the other hand, the UDD approach[19] does not rely on true labels for drift detection. This is achieved using deep neural networks and Monte Carlo methods to retrain the model when it drifts. UDD outperforms other state-of-the-art strategies, such as KSWIN, in terms of accuracy on two synthetic datasets and ten real-world datasets for both regression and classification tasks. However, UDD utilizes Monte Carlo Dropout in conjunction with uncertainty estimates from a deep neural network. This means that the network must be trained with dropout, which may not always be feasible.
One approach available for drift detection is type-driven[20]. This method aims to address concept drift in data streams by identifying the specific type of drift and using this information to improve the accuracy of adaptation. The authors conducted experiments on both synthetic and real-world data, and the results showed demonstrated that accurately identifying the type of drift can significantly enhance the accuracy of adaptation. However, it should be noted that the drift type identifier is pre-trained on data with known drift types. This could potentially limit the effectiveness of the model on real-world data sets where the all drift type points may not be known.
There are methods to detect drift without requiring access to labels[21]. However, with this type of methodology, knowledge of changes is limited, which can reduce the effectiveness of drift detection methods.
In this paper, we propose a method that is more suitable for the characteristics of real data, where data streams may have uncertain real labels and structural changes.
Our goal is to improve concept drift detection by considering the impact of current input data on the prediction model. The proposed approach uses Neutrosophic set theory and computation of Neutrosophic entropy into adaptive windows to detect uncertain concept changes over time in data streams with uncertain true labels.
The rest of the paper is organized as follows: Section 1 describes the materials and methods used; Section 2 presents the proposed concept drift detection methodology; Section 3 presents the experimental results and compares them with the results of existing methods; Finally, Section 4 concludes the paper and discusses future research directions.
1 Materials and Methods
1.1 Neutrosophic Support Vector Machine (N-SVM)
Neutrosophic is a powerful tool for classification, particularly in cases where the data is imprecise or uncertain[22]. The N-SVM method is based on the concept of neutrosophic sets, which contain elements with varying degrees of truth, falsity, and indeterminacy. This allows N-SVM to accurately quantify indeterminacy and make predictions. The definition of N-SVM is based on the definition of Support Vector Machine (SVM) [23]. To use N-SVM, we start with a set of M training data points where xi is a multidimensional vector and yi is the corresponding class. In N-SVM, each point xi belonging to class yi which contains a neutrosophic triple {T (xi) , F (xi) , I (xi) } that represents the truth, falsity, and indeterminacy values of point i for class j[24]. A data sample (xi) is presented as {T (xi) , F (xi) , I (xi) }/xi, where {T (xi) , F (xi) , I (xi) } are the membership values to the truth, indeterminate, and false sets. T (xi) is used to measure the belonging degree of the sample to the corresponding labeled class, I (xi) for indiscrimination degree between classes and F (xi) for the belonging degree to the outliers.
To determine the optimal hyperplane, we can obtain the interval margin by defining a set M of non-negative variables denoted by {θ1, θ2, ···, θm}, N-SVM find an optimal solution by optimizing w with the constraint:
For each element xi, we have:
(1)
where w stands for the weight for optimal hyperplane. The optimal hyperplane is then determined as:
(2)
where C is a constant that balances between the maximum margin and the minimum error.
1.2 Adaptive WINdowing (ADWIN)
ADWIN is an adaptive windowing algorithm designed to detect changes and retain statistical values for a data stream. It maintains an adaptive window, which is crucial for the initial machine learning model. ADWIN continuously updates the window as long as no concept drift is detected. We selected this algorithm because it is proven to effectively handle real data without prior knowledge of data distribution or structure model[25].
1.3 Entropy for Neutrosophic Sets
Entropy is a measure of the uncertainty present in a set of information. This uncertainty can be attributed to two reasons: partial belongingness and non-belongingness, as well as indeterminacy. For any neutrosophic set X, a sample is represented as {T (xi) , F (xi) , I (xi) }/ xi, where T (xi) , F (xi) and I (xi) represent the membership values for true, false, and indeterminate labels, respectively. The entropy of a neutrosophic set X is calculatedW using the formula[26]:
(3)
In our method, the entropy is used as an input for the ADWIN change detector.
2 Methodology
The proposed method takes into account the impact of the indeterminate current input data on the prediction model. Fig.1 outlines the steps in the proposed workflow. Firstly, the data stream is read in windows and presented with neutrosophic components (T, F, I) . Given a set of N training data, each instance insti is represented as {T (insti) , F (insti) , I (insti) }/insti, where T (insti) , F (insti) , and I (insti) represent the membership values for true, false, and indeterminate labeling, respectively[24]. To make predictions, a N-SVM was utilized, as it demonstrated the best performance during our experiments. The values generated by the N-SVM are expressed as entropy and used as input for the ADWIN change detector. When a drift is detected, the prediction model is retrained by including only the most recent instances (Drecent) with the original training data set (NDtr) . This guarantees that the model is updated with the most current data and has enough data for good accuracy.
Fig.1Steps in the proposed Neutrosophic Drift Detection (N-DD) for drift detection
The semantic algorithm of the proposed method Neutrosophic Drift Detection (N-DD) is outlined below:
Algorithm: N-DD
Input: Neutrosophic data stream (ND) and Neutrosophic training data set NDtr
While do not end neutrosophic data stream ND
Receive instance instt (T, F, I)
Npt, NUnct predicted instt
Add (uncertainty as Entropy) NUnct to ADWIN
if ADWIN detects changes then
Receive recent labels Drecent
Training model with (NDtr and Drecent)
end if
Output: Neutrosophic prediction Npt at time t
In the above algorithm, NUnct stands for neutrosophic uncertainty, and Npt stands for neutrosophic prediction at time t.
3 Experimental Work and Results
3.1 Data Sets Description
We conducted our experiments using widely-used data sets in concept drift research. All of the data sets used in this study are publicly available on open platforms such as OpenML, Kaggle, and the USP Data Stream Repository.
To evaluate the performance of data stream classification and drift detection. These data sets possess various characteristics, including binary class, multi-class, temporal dependencies, and imbalanced data, making them well-suited for comparing different drift detection approaches and demonstrating the effectiveness of the proposed method, N-DD. Table1 presents the data sets used in this study, which includes four real data sets for binary classification with temporal dependencies: Electricity (ELE) , Ozone (OZO) [27], GMSC[28] (GMS) , and Airlines (AIR) [29]. Ozone (OZO) is an imbalanced data set with temporal dependencies, while Keystroke (KEY) [30] is also included as a real multi-class data set with both temporal dependencies and imbalances. Additionally, Gas Sensor (GAS) [27] and Covertype (COV) [31] are incorporated as real-world multi-class data sets with temporal dependencies and imbalances, respectively.
Table1Data sets
3.2 Parameters Setup
The purpose of this study is to analyze how current input data affects the properties of prediction models, rather than solely focusing on the parameters of the models themselves. In order to accomplish this, we have conducted experiments to carefully select parameters that do not require unnecessary retraining and will ultimately produce the most accurate results. Therefore, the parameters configured thoughtfully, taking into account the trade-off between detection accuracy and the number of times the model needs to be retrained. Sklearn and the Scikit-Multiflow were used in the implementation.
The data stream was divided into three parts: 10% for training, 20% for validation, and 5% for data evaluation. The first 10% of data stream points were used for training the model. Validation for ADWIN was performed using the following20% of the data stream. A change was detected when the absolute difference in means between two sub-windows of recent observations was greater than the constant alpha. The sub-windows width was experimentally fixed at 5% of the data stream, and the alpha value was set to 0.002 if no drift was detected in the validation data. When a change was detected, the last data points and corresponding labels equivalent to 5% of the overall data stream length were used for retraining.
The experimental setup was as follows:
The N-SVM parameter C was searched in the range of [10-2, 102] with a step size of 10-1. In addition, w= 0.125.
The Kolmogorov-Smirnov test-based drift detector KSWIN does not require a true label to detect concept drift. Instead, it uses them for retraining the model in case of a drift. KSWIN requires a suitable alpha value to determine the sensitivity for concept drift detection[32]. We optimized alpha using the same procedure as for N-DD. The scikit-multiflow library was used for KSWIN with a window size of 200 and state_size of 100.
The Dynamic Weighted Majority (DWM) is an ensemble method for concept drift. It maintains an ensemble of base learners, and the ensemble dynamically creates and deletes experts when a change is detected. Unlike KSWIN, DWM relies on continuously obtaining true labels over time in order to be able to detect drift. The parameters of DWM were set to typical values of β=0.5, θ=0.01, and p=1, as mentioned in Ref.[15].
3.3 Evaluation Metrics
In the real world, data often do not have clearly defined drift points, making it challenging to accurately detect and measure the accuracy of drift detection. This is because the actual drift points are unknown, so we must rely on a decrease in prediction performance over time as an indication of drift[33]. To address this issue, we use the Matthews and Correlation Coefficient (MCC) in this study, as it is able to handle real and imbalanced data[34]. A higher MCC value indicates better performance on the classification data sets. The MCC is defined as follows:
(4)
where, TP represents true positives, TN is true negatives, FP is false positives, FN is false negatives.
3.4 Experimental Results and Discussion
The results of the experiments for all data sets are presented in Table2, which displays the MCC values and the corresponding number of retraining for each real data set. It is evident that the proposed method, N-DD, outperforms the other strategies KSWIN, ADWIN, and DWM on data sets with two or multiple classes. It is noteworthy that while KSWIN and DWM show promising results for data sets (ELE, OZO, GMS, AIR, KEY, GAS) due to their adaptability to temporal dependencies, they require a significantly higher number of retraining compared to N-DD. This is because both methods, KSWIN and DWM, assume that the data have the same distribution. In contrast, the proposed approach, N-DD based on neutrosophic set, is better suited for real-world data. This is because it takes into account the nature of the data, which may not reflect the assumed distribution.
Similarly, for the OZO data set, both DWM and KSWIN show good results as they are designed to handle imbalanced and temporal dependencies. However, for the AIR data set, the most significant difference lies in the number of retrainings required for N-DD compared to KSWIN and DWM. While N-DD only requires 15 retrainings, KSWIN and DWM require148 and 32 retrainings, respectively. Similarly, for the GAS data set, N-DD requires 16 retrainings, while KSWIN, DWIN, and DWM require109, 45, and 41 retrainings, respectively. In the COV data set, N-DD requires 32 retrainings, while KSWIN, ADWIN, and DWM require a total of 141, 62, and 58 retrainings, respectively. It is evident that N-DD requires the least number of retrainings and has the best predictive accuracy, this is because ADWIN needs all true labels for drift detection itself, while KSWIN and DWM rely on continuously obtaining all true labels over time to detect drift and make corresponding training updates. On the other hand, N-DD is capable of detecting drifts even when the true labels are unknown. This is particularly advantageous in real-world applications where obtaining true labels is difficult and costly. Additionally, N-DD utilizes uncertainty estimates from entropy neutrosophic sets, allowing it to take into account the impact of current input data on the prediction model's properties, rather than solely detecting changes in the input data. By considering the dependence of the prediction model's properties on uncertainty estimates, we can avoid unnecessary retraining.
Table2MCC values with number of retraining
In contrast, N-DD performs better than KSWIN, DWIN, and DWM on binary or multiple classes, in terms of both the number of retrainings, and predictive performance with minimal differences in the latter. The reason N-DD can solely detect changes in the input data is by considering the dependence of the prediction model's properties on uncertainty estimates. This allows us to avoid unnecessary retraining.
Table3 presents the accuracy of the proposed method, N-DD. It shows the prediction performance of N-DD with the highest mean accuracy calculated at the lowest (scale1) neutrosophic entropy, which represents uncertainty.
Table3Accuracy of proposed method on the data sets
The proposed method, N-DD has been proven to be highly accurate in producing results for data sets that contain multiple or binary classes. For example, it has achieved an impressive accuracy rate of 95% when applied to the COV data set, which is a large data set with multiple classes. Additionally, it has also been successful in predicting for binary classes in the AIR dataset, with an accuracy rate of 92%. This can be attributed to the incorporation of uncertainty estimates from entropy neutrosophic sets in the N-DD method. This allows the method to effectively consider the influence of current input data on the prediction model, regardless of the type of data set.
Fig.2 illustrates the accuracy of the proposed method when applied to various data sets. The x-axis represents the level of uncertainty, determined by sorting the instances from lowest to highest based on their entropy values.
As shown in Fig.2, the accuracy of classifying real data sets is dependent on the level of uncertainty. This means that as uncertainty increases, the accuracy decreases. This finding supports our research methodology, which considers both the current input data and the indeterminacy presented by neutrosophic sets in the properties of the production model. This approach differs from simply detecting changes in the input data. As a result, we have developed a solution for the indeterminate true label problem by measuring the level of uncertainty in the prediction made by N-SVM using neutrosophic set theory.
Through comparative analysis with other methods, N-DD has been proposed as a method for detecting drift with indeterminate true labels. This method has been found to outperform others for several reasons. While ADWIN is able to obtain accurate true labels in real time and use them to identify drift and train accordingly, N-DD is able to detect drift even with indeterminate true labels. Additionally, KSWIN assumes that the data are identically distributed, which may not hold true in real-world datasets and can affect the performance of the method. Furthermore, DWM requires retraining of the prediction model, which can be computationally expensive, especially for large datasets. Moreover, DWM also assumes that the data are independent and identically distributed, which may not hold true in real-world datasets. In contrast, N-DD is less computationally expensive compared to other methods as it has the ability to avoid unnecessary retraining by considering the impact of current input data on the properties of the prediction model. This is possible by the use of Neutrosophic theory, which is well-suited for real-world data.
Fig.2Relation between prediction accuracy and uncertainty
The average running time of the methods studied was compared with that of the proposed method N-DD by applying them to real data sets using a Core-i7-8550U processor with 8 GB of memory.
After conducting our research, we have determined that the N-DD method has a lower average running time (0.96 s) compared with the KSWIN (1.1 s) and DWM (1.3 s) methods. However, we also observed that the ADWIN method has an even better average running time of 0.91 s, slightly surpassing N-DD's. This is due to our experimental tuning of parameters to enhance its effectiveness. As a result, N-DD required a smaller number of training, making its average running time acceptable, while the other methods required a larger number of training.
From these experiments, we highlight the following results:
1) N-DD is capable of detecting drifts even when the true labels are unknown. This is particularly advantageous in real-world applications where obtaining true labels is difficult and costly.
2) One of the main strengths of N-DD is its use of uncertainty estimates from entropy neutrosophic sets. This enables the algorithm to not only identify changes in the input data, but also consider the effects of these changes on the prediction model's characteristics. This approach helps to prevent unnecessary retraining, making N-DD more efficient and cost-effective.
3) N-DD has an acceptable average running time compared with other methods, such as KSWIN and DWM. This can be attributed to the proposed method, which requires a smaller number of retrainings.
In the N-DD approach, the reliability of drift detection may be compromised if the uncertainty estimates are unreliable. This is often a common issue with other methods as well, as poor data quality can lead to inaccurate results. However, this is a rare occurrence and can be eliminated by measuring the uncertainty using neutrosophic entropy. This allows for the consideration of the nature of the data, which may not accurately reflect the assumed distribution.
4 Conclusions
In this study, we utilize neutrosophic set theory to develop an efficient algorithm, called N-DD, for detecting concept drift. This algorithm is specifically designed for situations where the true labels are indeterminate, and takes into account the impact of the current input data on the properties of the model. To achieve this, our proposed method reads data into Windows and calculates a set of values based on the concept of neutrosophic membership. These values are then used in the N-SVM. We measure the uncertainty values of the corresponding prediction made by N-SVM as neutrosophic entropy. These values are then used as input for the ADWIN change detector. When a drift is detected, the prediction model is retrained by including only the most recent instances with the original training data set. This is a crucial adaptation for real-world drift detection, where true labels are expensive, indeterminate, and not always readily available. Unlike most state-of-the-art algorithms that require access to the entire set of true labels, our method is based on the theory of neutrosophic, which deals with indeterminate values. This makes our method highly suitable for detecting drift in situations where true labels are limited and indeterminate. Our experiments were conducted on seven real data sets, and the results demonstrate that N-DD outperforms other methods, such as ADWIN, KSWIN, and DWM, in detecting drift with the least amount of retraining needed. Additionally, the N-DD approach had the lowest average running time compared with KSWIN and DWM, with only a minimal difference from ADWIN.
In future work, we plan to enhance the N-DD method by incorporating a study of neutrosophic concepts and deep learning techniques. Furthermore, it is important to thoroughly evaluate the use of N-DD in cases where different types of indeterminacies may arise, such as uncertainty, conflicting information, and vague data. One approach to addressing these indeterminacies is through the use of refined neutrosophic logic[35], which specifically deals with inherent indeterminacy and allows for a more comprehensive analysis of uncertainty.