Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
An improved clustering ensemble method based link analysis
Hao Z., Wang L., Cai R., Wen W.  World Wide Web 18 (2): 185-195, 2015. Type: Article
Date Reviewed: May 27 2015

A method to improve clustering ensembles of datasets, called WETU, is presented in this paper. The current clustering ensemble methods use measurements, such as the weighted connection-triple (WCT), the weighted triple-quality (WTQ), and the combined similarity measure (CSM), which combines WCT and WTQ, to quantify the relations among data points within a cluster. The proposed method additionally considers the relations among the clusters, so that the resultant clusters are more accurate, stable, and meaningful.

The general method for clustering ensembles works as follows. A base algorithm, such as k-means, is used first to cluster the raw data with different sets of initialization conditions, each of which results in a different collection of clusters. Then, a link analysis is performed among the resultant collection of clusters. Here “links” refer to the relations among the clusters with the same initial condition (one run) and with different conditions (different runs). Within a run, because of the hard clustering (that is, one data point belongs to only one cluster), there is no explicit link among different clusters. WETU measures the similarity between two clusters based on their common neighboring clusters in different runs.

Specifically, WETU measures the relation between any two clusters X and Y as a fraction f(X,Y,Z)/g(X,Y,Z), assuming cluster Z has links to both X and Y. The enumerator contains as a main factor the number of weighted links between (X,Y), and (Y,Z). The denominator measures the weighted links between Z and the rest of the collection. The larger the value of the enumerator, the more “common” elements between X and Y; the larger the denominator, the lesser the contribution of cluster Z in terms of the commonality between X and Y. The novelty of WETU is its ability to measure commonality based on the neighbors of clusters that do not have direct common points.

The authors used six datasets, two synthetic and four real, to compare the different methods. The sizes of datasets vary from 150 data points to 2,500 data points, with a range of 10 to 60 features. The methods being compared include k-means clustering (KMC), base clustering I, CSM + global k-means clustering (CSM+GKMC), WTU+GKMC, and WETU+GKMC. The measurements of comparison are clustering accuracy (CA) and normalized mutual information (NMI). All results indicate that WETU outperforms the other methods.

In summary, this work introduces a new and effective method to manage clustering ensembles. WETU offers a different perspective for researchers in the clustering ensemble area. The contribution is significant, but the writing of the paper could have been improved to more effectively convey the information.

Reviewer:  Xiannong Meng Review #: CR143476 (1508-0718)
Bookmark and Share
  Editor Recommended
Clustering (H.3.3 ... )
World Wide Web (WWW) (H.3.4 ... )
Would you recommend this review?
Other reviews under "Clustering": Date
A viewable indexing structure for the interactive exploration of dynamic and large image collections
Rayar F., Barrat S., Bouali F., Venturini G.  ACM Transactions on Knowledge Discovery from Data 12(1): 1-26, 2018. Type: Article
May 31 2018
Fast and accurate time-series clustering
Paparrizos J., Gravano L.  ACM Transactions on Database Systems 42(2): 1-49, 2017. Type: Article
Apr 16 2018
On temporal-constrained sub-trajectory cluster analysis
Pelekis N., Tampakis P., Vodas M., Doulkeridis C., Theodoridis Y.  Data Mining and Knowledge Discovery 31(5): 1294-1330, 2017. Type: Article
Apr 12 2018

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2018 ThinkLoud, Inc.
Terms of Use
| Privacy Policy