Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
An improved clustering ensemble method based link analysis
Hao Z., Wang L., Cai R., Wen W. World Wide Web18 (2):185-195,2015.Type:Article
Date Reviewed: May 27 2015

A method to improve clustering ensembles of datasets, called WETU, is presented in this paper. The current clustering ensemble methods use measurements, such as the weighted connection-triple (WCT), the weighted triple-quality (WTQ), and the combined similarity measure (CSM), which combines WCT and WTQ, to quantify the relations among data points within a cluster. The proposed method additionally considers the relations among the clusters, so that the resultant clusters are more accurate, stable, and meaningful.

The general method for clustering ensembles works as follows. A base algorithm, such as k-means, is used first to cluster the raw data with different sets of initialization conditions, each of which results in a different collection of clusters. Then, a link analysis is performed among the resultant collection of clusters. Here “links” refer to the relations among the clusters with the same initial condition (one run) and with different conditions (different runs). Within a run, because of the hard clustering (that is, one data point belongs to only one cluster), there is no explicit link among different clusters. WETU measures the similarity between two clusters based on their common neighboring clusters in different runs.

Specifically, WETU measures the relation between any two clusters X and Y as a fraction f(X,Y,Z)/g(X,Y,Z), assuming cluster Z has links to both X and Y. The enumerator contains as a main factor the number of weighted links between (X,Y), and (Y,Z). The denominator measures the weighted links between Z and the rest of the collection. The larger the value of the enumerator, the more “common” elements between X and Y; the larger the denominator, the lesser the contribution of cluster Z in terms of the commonality between X and Y. The novelty of WETU is its ability to measure commonality based on the neighbors of clusters that do not have direct common points.

The authors used six datasets, two synthetic and four real, to compare the different methods. The sizes of datasets vary from 150 data points to 2,500 data points, with a range of 10 to 60 features. The methods being compared include k-means clustering (KMC), base clustering I, CSM + global k-means clustering (CSM+GKMC), WTU+GKMC, and WETU+GKMC. The measurements of comparison are clustering accuracy (CA) and normalized mutual information (NMI). All results indicate that WETU outperforms the other methods.

In summary, this work introduces a new and effective method to manage clustering ensembles. WETU offers a different perspective for researchers in the clustering ensemble area. The contribution is significant, but the writing of the paper could have been improved to more effectively convey the information.

Reviewer:  Xiannong Meng Review #: CR143476 (1508-0718)
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Clustering (H.3.3 ... )
 
 
World Wide Web (WWW) (H.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article
Dec 1 1992
A parallel algorithm for record clustering
Omiecinski E., Scheuermann P. ACM Transactions on Database Systems 15(3): 599-624, 1990. Type: Article
Nov 1 1992
Organization of clustered files for consecutive retrieval
Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article
Jun 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy