ComputingReviews.com

A survey on ensemble learning for data stream classification
Gomes H., Barddal J., Enembreck F., Bifet A. ACM Computing Surveys50(2):1-36,2017.Type:Article

Date Reviewed: 06/16/17

The automation of several processes, such as business transactions, smartphones, and various types of sensors, has severely increased the number of data stream generators. In data stream classification, data items are represented by a vector of features. Data items come very frequently in temporal order and are practically endless. Memory- and time-efficient correct online decisions are required for labeling. Only a finite number of recent instances are accessible to learn from. Ensemble learners combine decisions coming from different classifiers. These classifiers are called base learners. Base learners can be heterogeneous or homogeneous. They use different parameters or different biases or may learn from different data items. It is expected that base members complement each other and so an ensemble classifier provides a decision better than those of its members. Such ensembles are referred to as diverse.

In ensembles, there can be base learners with frequent incorrect decisions or some of them may have no contributions since their decisions may be mostly similar to other decisions. Such learners should be pruned and chance should be given to new ones. During the process of classification, new labels may appear or some existing labels may disappear. Input vectors that should be assigned their current labels change, or for certain input vector patterns that remain the same, labels assigned to them change. Some input vector patterns that do not appear for some time may reappear again and, remembering how decisions were made for them, may improve efficiency and effectiveness. Certain input vector patterns may show temporal interdependencies in terms of their labels.

This paper is a comprehensive survey on ensemble learning for data stream classification. It is easy to read. The provided taxonomy is a good summary of possibilities based on 65 classifiers. It emphasizes their similar and dissimilar aspects. The included list of open-source software resources is useful. Future research problems provide good pointers.

The paper will be useful to anybody interested in ensemble learning, with or without data streams. Another good survey on the same topic, published recently, shows the hotness of the area [1]. I expect to see some significant breakthroughs on this topic in the near future.

Krawczyk, B.; Minku, L. L.; Gama, J.; Stefanowski, J.; Wozniak, M. Ensemble learning for data stream analysis: a survey. Information Fusion 37 (2017), 132–156.

Reviewer: F. Can

Review #: CR145356 (1708-0564)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy