Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Subspace clustering of data streams: new algorithms and effective evaluation measures
Hassani M., Kim Y., Choi S., Seidl T. Journal of Intelligent Information Systems45 (3):319-335,2015.Type:Article
Date Reviewed: Jun 7 2016

Recently, much attention has been paid to data that evolve over time, which are usually called data streams. This paper proposes a contribution for comparing the efficiency of different existing subspace clustering algorithms, meaning algorithms that associate each cluster to a subset of the initial description space, in the case of data streams.

The main contribution of this work lies in the definition of a new evaluation measure that is dedicated to this subspace issue. The measure, called SubCMM (short for subspace cluster mapping measure), is derived from CMM [1] but it adds a penalty at the level of (object, attribute) pairs. This way, the dimensions outside the selected subspace are ignored in the calculus, which leads to a better fit of the clusters.

Besides, the authors propose a new procedure that can be deployed in their subspace massive online analysis (MOA) framework [2]. This procedure uses a two-step offline-online approach, similar to what is done in CluStream [3]: first, an offline step summarizes data points by introducing microclusters, and then an online step performs the clustering on this basis.

The tested subspace clustering algorithms are taken from the literature [3,4]. It is unfortunate that the experiments, which normally constitute the main contribution of the paper, along with the evaluation measure, are rather disappointing, for the authors only consider two synthetic data sets with three dimensions. We are far from the paper’s initial claim to deal with “high dimensionality.”

Reviewer:  Julien Velcin Review #: CR144473 (1608-0597)
1) Kremer, H.; Kranen, P.; Jansen, T.; Seidl, T.; Bifet, A.; Holmes, G.; Pfahringer, B. An effective evaluation measure for clustering on evolving data streams. In Proc. of KDD. ACM, 2011, 868–876.
2) Hassani, M.; Kim, Y.; Seidl, T. Database systems for advanced applications. Springer, 2013.
3) Aggarwal, C. C.; Han, J.; Wang, J.; Philip, S. Y. A framework for clustering evolving data streams. In Proc. of VLBD. Springer, 2003, 81–92.
4) Aggarwal, C. C.; Wolf, J. L.; Philip, S. Y.; Procopiuc, C.; Park, J. S. Fast algorithms for projected clustering. SIGMOD Record 28, 2(1999), 61–72.
Bookmark and Share
 
Clustering (H.3.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article
Dec 1 1992
A parallel algorithm for record clustering
Omiecinski E., Scheuermann P. ACM Transactions on Database Systems 15(3): 599-624, 1990. Type: Article
Nov 1 1992
Organization of clustered files for consecutive retrieval
Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article
Jun 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy