Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Mixture model averaging for clustering
Wei Y., McNicholas P. Advances in Data Analysis and Classification9 (2):197-217,2015.Type:Article
Date Reviewed: Sep 30 2015

Clustering is a popular task in data analysis. With a broad range of applications, various clustering approaches have been developed in the literature. These computational methods often result in different clusters from the same dataset. This is not necessarily a disadvantage since clustering is largely exploratory. However, it is still very useful in practice to determine, among these different clusters from the same data, which is likely to be the “best.” To make formal inference possible, other than heuristics, model-based clustering methods have been studied. As reported by the authors, Gaussian mixtures have dominated the model-based clustering literature until recently. Applying Gaussian mixtures, one may fit several models from a family and report clustering results from only the best one. The Bayesian information criterion is often applied for model selection. Then, only the results obtained with the possible best model are reported. The rest are discarded.

Instead of throwing away models that may not be the best, this work reports two strategies of averaging multiple models. These to-be-averaged models are close to the possible best, and produce a weighted average of clustering results. Methods on merging mixture components, averaging a posteriori probabilities, and model merging are reported. The effectiveness of the proposed computational approaches is tested on seven datasets from the real world and three simulated datasets with comparisons. The authors claim that their approaches are very fast and can be carried out for very large datasets, although an analysis on computational complexity is not provided.

Practitioners in the fields of data mining, machine learning, and modeling should benefit from reading this paper. This is because, as required by end users, practitioners of data analysis are often asked to provide a single clustering result, which is most likely to be the best. The mixture model averaging approaches reported in this paper provide an alternative to the previous best from the Bayesian information criterion. Researchers working on model-based clustering and information theory will benefit from reading this paper, too.

Reviewer:  Chenyi Hu Review #: CR143812 (1512-1061)
Bookmark and Share
  Featured Reviewer  
 
Clustering (H.3.3 ... )
 
 
Data Mining (H.2.8 ... )
 
 
Data Models (H.2.1 ... )
 
 
Learning (I.2.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article
Dec 1 1992
A parallel algorithm for record clustering
Omiecinski E., Scheuermann P. ACM Transactions on Database Systems 15(3): 599-624, 1990. Type: Article
Nov 1 1992
Organization of clustered files for consecutive retrieval
Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article
Jun 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy