Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Countering the concept-drift problems in big data by an incrementally optimized stream mining model
Yang H., Fong S. Journal of Systems and Software102 (C):158-166,2015.Type:Article
Date Reviewed: Dec 1 2015

Big data is a very interesting field of research with a lot of industry potential for production-ready applications. Since the field is relatively new, there is enough room for innovation. In fact, big data can be seen everywhere: mouse clicks on an e-commerce website, customer preferences in a shop, intrusion detection, log patterns, and so on. Most small companies seem to be unaware of the potential information hidden in their data streams, due to the technicalities and the novelty of the research field (for example, lack of easy-to-use tools).

Data stream mining is a complex matter that requires skills to obtain best performances and best predictions. In fact, a common problem of data stream mining is the fact that the “underlying data distribution of newly arrived data may appear differently than the old one in the real world.” Basically, the underlying (predicted) model changes in unforeseen ways (for example, a customer in a shop changes his behavior due to new advertisements, promotions, and so on). This is the concept-drift phenomenon, an issue that may deteriorate the model prediction accuracy over time.

Concept-drift detection techniques have been proposed, for example, re-training, change-detection tests, and decision trees. The latter, decision trees, are widely used data classification techniques. Decision trees are distinguished in two categories: multi-tree and single-tree algorithms. Decision trees contain instances of the data classifications and their properties. The predicate that is associated with a node is called the split-condition.

The authors propose a single tree with an optimized node-splitting mechanism to detect the concept-drift: iOVFDT. They performed experiments with publicly available datasets (for example, the MOA [1]) to compare the performance of the new method with other algorithms: in fact, the evaluation of the results over iOVFDT shows that it obtains better accuracy and uses less memory. Of course, in intelligent systems, such comparisons vary from context to context. Due to this comparison, this paper contains useful pointers that can be used in both industrial and research innovation.

Reviewer:  Massimiliano Masi Review #: CR143985 (1602-0133)
1) Bifet, A.; Holmes, G.; Kirkby, R.; Pfahringer, B. MOA: massive online analysis. Journal of Machine Learning Research 11 (2010), 1601–1604.
Bookmark and Share
  Featured Reviewer  
 
Data Mining (H.2.8 ... )
 
 
Trees (G.2.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy