Computing Reviews

Scalable density-based clustering with quality guarantees using random projections
Schneider J., Vlachos M. Data Mining and Knowledge Discovery31(4):972-1005,2017.Type:Article
Date Reviewed: 10/30/17

Efficient clustering techniques are required for knowledge discovery in large databases. The efforts of scientists have contributed to the development of many clustering algorithms.

This research paper considers the class of density-based clustering algorithms and presents all details, both theoretical and technical, on SOPTICS--speedy OPTICS--a random-projection-based version of the popular OPTICS algorithm. The authors extend their previous work [1] in order to show theoretical arguments on the performance of SOPTICS.

In nine sections and an appendix, the reader will find a valuable description of basic density-based clustering algorithms; the technique of random projections; the steps of the proposed algorithm (including pseudocode); and theoretical results on the speed of algorithms used in different steps like partitioning, neighborhood identification, density estimate, and so on. Starting with the seventh section, a deep analysis is conducted on SOPTICS. Twelve theorems are developed to prove different aspects of the proposed strategy. In section 8, an empirical evaluation is described related to both runtime and clustering quality over ten datasets. SOPTICS showed clear performance advantages when compared against basic OPTICS, OPTICS with locality sensitive hashing, and DeLi-Clu.

The Java implementation of SOPTICS is available at the second author’s website [2], and the previous version can be found as included by the ELKI project [3].

Even for a long paper, it is to the authors’ merit that they include only the necessary background and references for a good understanding of the context, proofs, and SOPTICS.

I highly recommend this contribution to data scientists, researchers in data mining, and students pursuing master’s or PhD degrees doing research in the knowledge discovery field.


1)

Schneider, J.; Vlachos, M. Fast parameterless density-based clustering via random projections. In Proc. of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, CA), ACM, New York, NY, 2013, 861–866.


2)

Schneider, J.; Vlachos, M. Scalable density-based clustering with quality guarantees using random projections, source code. 2013. http://alumni.cs.ucr.edu/~mvlachos/erc/projects/density-based/src.zip. Accessed 10/03/2017.


3)

ELKI project. https://elki-project.github.io/releases/ (10/03/2017).

Reviewer:  G. Albeanu Review #: CR145627 (1801-0025)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy