In unsupervised machine learning, algorithms based on the classical k-means framework provide local optimal solutions. This is why it is really important to begin the optimization process with a good initial partition of the data. In this paper, the authors propose a new approach that addresses the problem of initializing the centroids. Basically, they proceed in two steps. First, they provide a new 2D representation space that has the property of maximizing the variance, similar to a factorial decomposition. Then, they use an iterative process that chooses for each step the farthest data point for a centroid.
The authors perform experiments on five datasets from the University of California, Irvine (UCI) Machine Learning Repository, and compare their results to other state-of-the-art approaches. They show that their approach is rather effective in retrieving the hidden true classes of the objects.
This interesting paper provides a practical and efficient solution to a well-known problem. Regrettably, the approach has not been tested on a true, practical case study.