Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
A rapid hybrid clustering algorithm for large volumes of high dimensional data
Rathore P., Kumar D., Bezdek J., Rajasegarar S., Palaniswami M.  IEEE Transactions on Knowledge and Data Engineering 31 (4): 641-654, 2019. Type: Article
Date Reviewed: Mar 10 2020

FensiVAT is a rapid hybrid clustering algorithm that identifies clusters in large datasets characterized by many instances (N) and multiple features (p) in each instance.

FensiVAT is an improvement over popular algorithms based on random sampling, such as clustering large applications (CLARA) using k-means, clustering using representatives (CURE), and clustering with improved visual assessment of tendency (clusiVAT), or using dimensionality reduction by projecting data on a lower dimension space, such as CLIQUE and PROCLUS. These approaches suffer from space and/or time complexity issues.

FensiVAT integrates techniques for random projection and the visual assessment of cluster tendency by random sampling matrices, obtained by random projection of the dataset in a lower dimension space and aggregating multiple distances using principal component analysis (PCA) and linear discriminant analysis (LDA), called maximin and random sampling (MMRS).

The authors’ ten-step algorithm includes input, dataset generation in downspace, near-MMRS sampling, reduced image (iVAT) generation, application of VAT/iVAT to distance matrices, clustering, and extension in down-space. They apply FensiVAT in the analysis of US Census 1990, KDD CUP, FOREST, MiniBoone, MNIST, and ACT datasets. FensiVAT is an order of magnitude faster than clusiVAT and several orders of magnitude faster than the other approaches without compromising accuracy.

This well-written paper has 55 references and will interest the big data community.

Reviewer:  Anoop Malaviya Review #: CR146925
Bookmark and Share
  Editor Recommended
Clustering (H.3.3 ... )
Would you recommend this review?
Other reviews under "Clustering": Date
Triclustering algorithms for three-dimensional data analysis: a comprehensive survey
Henriques R., Madeira S.  ACM Computing Surveys 51(5): 1-43, 2018. Type: Article
Jan 18 2019
A viewable indexing structure for the interactive exploration of dynamic and large image collections
Rayar F., Barrat S., Bouali F., Venturini G.  ACM Transactions on Knowledge Discovery from Data 12(1): 1-26, 2018. Type: Article
May 31 2018
Fast and accurate time-series clustering
Paparrizos J., Gravano L.  ACM Transactions on Database Systems 42(2): 1-49, 2017. Type: Article
Apr 16 2018

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2020 ThinkLoud, Inc.
Terms of Use
| Privacy Policy