Computing Reviews

Subspace clustering of data streams:new algorithms and effective evaluation measures
Hassani M., Kim Y., Choi S., Seidl T. Journal of Intelligent Information Systems45(3):319-335,2015.Type:Article
Date Reviewed: 06/07/16

Recently, much attention has been paid to data that evolve over time, which are usually called data streams. This paper proposes a contribution for comparing the efficiency of different existing subspace clustering algorithms, meaning algorithms that associate each cluster to a subset of the initial description space, in the case of data streams.

The main contribution of this work lies in the definition of a new evaluation measure that is dedicated to this subspace issue. The measure, called SubCMM (short for subspace cluster mapping measure), is derived from CMM [1] but it adds a penalty at the level of (object, attribute) pairs. This way, the dimensions outside the selected subspace are ignored in the calculus, which leads to a better fit of the clusters.

Besides, the authors propose a new procedure that can be deployed in their subspace massive online analysis (MOA) framework [2]. This procedure uses a two-step offline-online approach, similar to what is done in CluStream [3]: first, an offline step summarizes data points by introducing microclusters, and then an online step performs the clustering on this basis.

The tested subspace clustering algorithms are taken from the literature [3,4]. It is unfortunate that the experiments, which normally constitute the main contribution of the paper, along with the evaluation measure, are rather disappointing, for the authors only consider two synthetic data sets with three dimensions. We are far from the paper’s initial claim to deal with “high dimensionality.”


1)

Kremer, H.; Kranen, P.; Jansen, T.; Seidl, T.; Bifet, A.; Holmes, G.; Pfahringer, B. An effective evaluation measure for clustering on evolving data streams. In Proc. of KDD. ACM, 2011, 868–876.


2)

Hassani, M.; Kim, Y.; Seidl, T. Database systems for advanced applications. Springer, 2013.


3)

Aggarwal, C. C.; Han, J.; Wang, J.; Philip, S. Y. A framework for clustering evolving data streams. In Proc. of VLBD. Springer, 2003, 81–92.


4)

Aggarwal, C. C.; Wolf, J. L.; Philip, S. Y.; Procopiuc, C.; Park, J. S. Fast algorithms for projected clustering. SIGMOD Record 28, 2(1999), 61–72.

Reviewer:  Julien Velcin Review #: CR144473 (1608-0597)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy