Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A rapid hybrid clustering algorithm for large volumes of high dimensional data
Rathore P., Kumar D., Bezdek J., Rajasegarar S., Palaniswami M. IEEE Transactions on Knowledge and Data Engineering31 (4):641-654,2019.Type:Article
Date Reviewed: Mar 10 2020

FensiVAT is a rapid hybrid clustering algorithm that identifies clusters in large datasets characterized by many instances (N) and multiple features (p) in each instance.

FensiVAT is an improvement over popular algorithms based on random sampling, such as clustering large applications (CLARA) using k-means, clustering using representatives (CURE), and clustering with improved visual assessment of tendency (clusiVAT), or using dimensionality reduction by projecting data on a lower dimension space, such as CLIQUE and PROCLUS. These approaches suffer from space and/or time complexity issues.

FensiVAT integrates techniques for random projection and the visual assessment of cluster tendency by random sampling matrices, obtained by random projection of the dataset in a lower dimension space and aggregating multiple distances using principal component analysis (PCA) and linear discriminant analysis (LDA), called maximin and random sampling (MMRS).

The authors’ ten-step algorithm includes input, dataset generation in downspace, near-MMRS sampling, reduced image (iVAT) generation, application of VAT/iVAT to distance matrices, clustering, and extension in down-space. They apply FensiVAT in the analysis of US Census 1990, KDD CUP, FOREST, MiniBoone, MNIST, and ACT datasets. FensiVAT is an order of magnitude faster than clusiVAT and several orders of magnitude faster than the other approaches without compromising accuracy.

This well-written paper has 55 references and will interest the big data community.

Reviewer:  Anoop Malaviya Review #: CR146925 (2007-0171)
Bookmark and Share
  Editor Recommended
 
 
Clustering (H.3.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article
Dec 1 1992
A parallel algorithm for record clustering
Omiecinski E., Scheuermann P. ACM Transactions on Database Systems 15(3): 599-624, 1990. Type: Article
Nov 1 1992
Organization of clustered files for consecutive retrieval
Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article
Jun 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy