Big data is a very interesting field of research with a lot of industry potential for production-ready applications. Since the field is relatively new, there is enough room for innovation. In fact, big data can be seen everywhere: mouse clicks on an e-commerce website, customer preferences in a shop, intrusion detection, log patterns, and so on. Most small companies seem to be unaware of the potential information hidden in their data streams, due to the technicalities and the novelty of the research field (for example, lack of easy-to-use tools).
Data stream mining is a complex matter that requires skills to obtain best performances and best predictions. In fact, a common problem of data stream mining is the fact that the “underlying data distribution of newly arrived data may appear differently than the old one in the real world.” Basically, the underlying (predicted) model changes in unforeseen ways (for example, a customer in a shop changes his behavior due to new advertisements, promotions, and so on). This is the concept-drift phenomenon, an issue that may deteriorate the model prediction accuracy over time.
Concept-drift detection techniques have been proposed, for example, re-training, change-detection tests, and decision trees. The latter, decision trees, are widely used data classification techniques. Decision trees are distinguished in two categories: multi-tree and single-tree algorithms. Decision trees contain instances of the data classifications and their properties. The predicate that is associated with a node is called the split-condition.
The authors propose a single tree with an optimized node-splitting mechanism to detect the concept-drift: iOVFDT. They performed experiments with publicly available datasets (for example, the MOA [1]) to compare the performance of the new method with other algorithms: in fact, the evaluation of the results over iOVFDT shows that it obtains better accuracy and uses less memory. Of course, in intelligent systems, such comparisons vary from context to context. Due to this comparison, this paper contains useful pointers that can be used in both industrial and research innovation.