Categorizing texts into predefined classes is an important issue relevant to many fields, including data mining, natural language processing, and machine learning. In this paper, the authors build on previous work that investigates how building new features by means of clustering can boost classification performance [1].
The authors attempt to make two contributions: taking the entire test set into account during the clustering step, and relaxing the one class leads to one cluster constraint. The first is time consuming and prone to overfitting, which is observed during the experiments on three classical datasets. The authors seem to have ignored a huge part of the literature related to the second. In particular, a sustained effort has been undertaken in the topic modeling field. Latent Dirichlet allocation (LDA)-based dimensionality reduction shows an improvement for the task of text categorization [2], which is confirmed by the authors of this paper. Some recent work builds on a semi-supervised version of LDA for the task of regression and classification [3].
In conclusion, the contribution here is marginal, as the authors give little novel insight into solutions for the task of text categorization.