Most of the clustering literature deals with numeric data. This paper exposes a novel algorithm for clustering categorical data by following an “old school” top-down procedure. The main idea is very similar to clustering trees [1] with the following difference: the splitting criterion is based on the average information gain of the attributes, named mean gain ratio (MGR) here.
The contribution looks marginal even though the experiments show the superiority of the proposal on nine University of California at Irvine (UCI) benchmarks and artificial datasets in comparison to three previous approaches. Most of the references date back to ten years ago, and modern data mining issues seem to be out of the scope of MGR (for example, numerous attributes, heterogeneous and linked data). Experiments on a real, recent case study would have been a plus for convincing the reader of the relevance of this nth clustering algorithm.