Computing Reviews, the leading online review service for computing literature.

Search

Scalable density-based clustering with quality guarantees using random projections
Schneider J., Vlachos M. Data Mining and Knowledge Discovery31 (4):972-1005,2017.Type:Article

Date Reviewed: Oct 30 2017

Efficient clustering techniques are required for knowledge discovery in large databases. The efforts of scientists have contributed to the development of many clustering algorithms. This research paper considers the class of density-based clustering algorithms and presents all details, both theoretical and technical, on SOPTICS--speedy OPTICS--a random-projection-based version of the popular OPTICS algorithm. The authors extend their previous work [1] in order to show theoretical arguments on the performance of SOPTICS. In nine sections and an appendix, the reader will find a valuable description of basic density-based clustering algorithms; the technique of random projections; the steps of the proposed algorithm (including pseudocode); and theoretical results on the speed of algorithms used in different steps like partitioning, neighborhood identification, density estimate, and so on. Starting with the seventh section, a deep analysis is conducted on SOPTICS. Twelve theorems are developed to prove different aspects of the proposed strategy. In section 8, an empirical evaluation is described related to both runtime and clustering quality over ten datasets. SOPTICS showed clear performance advantages when compared against basic OPTICS, OPTICS with locality sensitive hashing, and DeLi-Clu. The Java implementation of SOPTICS is available at the second author’s website [2], and the previous version can be found as included by the ELKI project [3]. Even for a long paper, it is to the authors’ merit that they include only the necessary background and references for a good understanding of the context, proofs, and SOPTICS. I highly recommend this contribution to data scientists, researchers in data mining, and students pursuing master’s or PhD degrees doing research in the knowledge discovery field.

Reviewer: G. Albeanu	Review #: CR145627 (1801-0025)

1)	Schneider, J.; Vlachos, M. Fast parameterless density-based clustering via random projections. In Proc. of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, CA), ACM, New York, NY, 2013, 861–866.

2)	Schneider, J.; Vlachos, M. Scalable density-based clustering with quality guarantees using random projections, source code. 2013. http://alumni.cs.ucr.edu/~mvlachos/erc/projects/density-based/src.zip. Accessed 10/03/2017.

3)	ELKI project. https://elki-project.github.io/releases/ (10/03/2017).

Clustering (I.5.3 )

Would you recommend this review?

yes

Other reviews under "Clustering":	Date

On the convergence of “A self-supervised vowel recognition system” Pathak A., Pal S. Pattern Recognition 20(2): 237-244, 1987. Type: Article	Aug 1 1988

Conceptual clustering of structured objects: a goal-oriented approach Stepp R., Michalski R. (ed) Artificial Intelligence 28(1): 43-69, 1986. Type: Article	Sep 1 1986

The enhanced LBG algorithm Patané G., Russo M. Neural Networks 14(9): 1219-1237, 2001. Type: Article	Apr 2 2003

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy