Ensemble clustering methods address a number of issues that arise when identifying clusters in data. The use of sets, ensembles, and clustering results generated by multiple parameterizations or different algorithms can improve robustness in solutions, identify novel clusters that do not arise in a single algorithm, and provide confidence estimates on cluster membership [1]. Ensemble clustering is a widely used technique with similarities to techniques such as random forests, where classifiers are built from sets of randomly generated decision trees that vote on class.
This paper considers the issue of clustering in high-dimensional space through projection into a vector space of multiple clustering results. The result is ensemble clustering calculated rapidly by identifying the median in the vector space, which can be done in a computationally efficient way using the Weiszfeld algorithm.
The ensemble clustering in this paper compares favorably over many simulated and real datasets to other state-of-the-art methods. While it does not outperform its best competitors, which are based on the co-association matrix coupled with hierarchical clustering, the method is almost three times faster for equivalent clustering performance.
Given its speed, the method certainly appears to be worthy of consideration for anyone who needs ensemble clustering in large, high-dimensional datasets. In addition, the value of the projection method for obtaining ensemble results looks very promising across many machine learning domains. As the authors note, a remaining area worthy of exploration is the effect of different distance functions on the ensembles, and this raises the issue of robustness when the correct distance is difficult to know a priori.