基于随机标记子集的多标记数据流分类算法

Classification for Multi-label Data Streams Based on Random Labelsets

  • 摘要: 提出了基于随机标记子集的多标记数据流分类算法,其基本思想是在多标记分类过程中,将原始较大的标记集随机地划分为多个较小的标记子集,并针对每个标记子集训练一个概率分类器链.在充分利用标记间依赖关系的同时,又有效地降低了概率分类器链的时间复杂度.同时,在算法中嵌入了自适应滑动窗口算法来检测概念漂移.实验结果表明,同其他算法相比,在大多数数据集合上能够更有效地预测实例的类标集合,更适合概念漂移的环境.

     

    Abstract: To address the issue of concept drift, on the basis of considering the dependency between labels, a novel ensemble classifier was introduced based on random labelsets for multi-label data streams. First, it divided the label set into several subsets based on RAkEL algorithm. Then a classifier on each subset was built using probabilistic classifier chain. Moreover, the adaptive windowing algorithm as a change detector was used to deal with concept drift. The experimental results on both synthetic and real-world data streams showed that our method achieves better performance than the previous methods, especially in datasets with concept drifts.

     

/

返回文章
返回