Semi-Random Forest Classification Based on Closed Frequent Pattern for Data Streams
-
-
Abstract
To solve the issues of noise and concept drift exists in the data stream, a Semi-Random Forest Classification based on Closed Frequent Pattern (SRFCFP) for Data Streams algorithm was proposed. SRFCFP used the closed frequent patterns to represent the input data stream to remove redundant information and noise and highlight the characteristics of data. Semi-random forests were used to construct the classifier after representation, and a pattern set updating mechanism based on time decay model was proposed for the continuous data stream. Meanwhile, in order to detect and adapt to concept drift in time, a difference measurement method for pattern set was proposed, which used the mined patterns to measure distribution changes. The experiments were performed under the MOA using real-world datasets and synthetic datasets, respectively. The results showed that the proposed method can outperform the related comparison algorithm in average accuracy, and can effectively deal with the concept drift and noise.
-
-