Computing Reviews, the leading online review service for computing literature.

Search

Countering the concept-drift problems in big data by an incrementally optimized stream mining model
Yang H., Fong S. Journal of Systems and Software102 (C):158-166,2015.Type:Article

Date Reviewed: Dec 1 2015

Big data is a very interesting field of research with a lot of industry potential for production-ready applications. Since the field is relatively new, there is enough room for innovation. In fact, big data can be seen everywhere: mouse clicks on an e-commerce website, customer preferences in a shop, intrusion detection, log patterns, and so on. Most small companies seem to be unaware of the potential information hidden in their data streams, due to the technicalities and the novelty of the research field (for example, lack of easy-to-use tools). Data stream mining is a complex matter that requires skills to obtain best performances and best predictions. In fact, a common problem of data stream mining is the fact that the “underlying data distribution of newly arrived data may appear differently than the old one in the real world.” Basically, the underlying (predicted) model changes in unforeseen ways (for example, a customer in a shop changes his behavior due to new advertisements, promotions, and so on). This is the concept-drift phenomenon, an issue that may deteriorate the model prediction accuracy over time. Concept-drift detection techniques have been proposed, for example, re-training, change-detection tests, and decision trees. The latter, decision trees, are widely used data classification techniques. Decision trees are distinguished in two categories: multi-tree and single-tree algorithms. Decision trees contain instances of the data classifications and their properties. The predicate that is associated with a node is called the split-condition. The authors propose a single tree with an optimized node-splitting mechanism to detect the concept-drift: iOVFDT. They performed experiments with publicly available datasets (for example, the MOA [1]) to compare the performance of the new method with other algorithms: in fact, the evaluation of the results over iOVFDT shows that it obtains better accuracy and uses less memory. Of course, in intelligent systems, such comparisons vary from context to context. Due to this comparison, this paper contains useful pointers that can be used in both industrial and research innovation.

Reviewer: Massimiliano Masi	Review #: CR143985 (1602-0133)

1)	Bifet, A.; Holmes, G.; Kirkby, R.; Pfahringer, B. MOA: massive online analysis. Journal of Machine Learning Research 11 (2010), 1601–1604.

Data Mining (H.2.8 ... )

Trees (G.2.2 ... )

Would you recommend this review?

yes

Other reviews under "Data Mining":	Date

Feature selection and effective classifiers Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article	May 1 1999

Rule induction with extension matrices Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article	Jul 1 1998

Predictive data mining Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)	Feb 1 1999

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy