Computing Reviews, the leading online review service for computing literature.

Search

The impact of semi-supervised clustering on text classification
Kyriakopoulou A., Kalamboukis T. PCI 2013 (Proceedings of the 17th Panhellenic Conference on Informatics, Thessaloniki, Greece, Sep 19-21, 2013)180-187.2013.Type:Proceedings

Date Reviewed: Apr 21 2014

Categorizing texts into predefined classes is an important issue relevant to many fields, including data mining, natural language processing, and machine learning. In this paper, the authors build on previous work that investigates how building new features by means of clustering can boost classification performance [1]. The authors attempt to make two contributions: taking the entire test set into account during the clustering step, and relaxing the one class leads to one cluster constraint. The first is time consuming and prone to overfitting, which is observed during the experiments on three classical datasets. The authors seem to have ignored a huge part of the literature related to the second. In particular, a sustained effort has been undertaken in the topic modeling field. Latent Dirichlet allocation (LDA)-based dimensionality reduction shows an improvement for the task of text categorization [2], which is confirmed by the authors of this paper. Some recent work builds on a semi-supervised version of LDA for the task of regression and classification [3]. In conclusion, the contribution here is marginal, as the authors give little novel insight into solutions for the task of text categorization.

Reviewer: Julien Velcin	Review #: CR142196 (1407-0590)

1)	Kyriakopoulou, A.; Kalamboukis, T. Using clustering to enhance text classification. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007, 805-806.

2)	Blei, D. M.; Ng, A. Y.; Jordan, M. I. Latent Dirichlet allocation. The Journal of Machine Learning Research 3, (2003), 993–1022.

3)	Zhu, J.; Ahmed, A.; Xing, E. P. MedLDA: maximum margin supervised topic models for regression and classification. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009, 1257-1264.

Clustering (I.5.3 )

Induction (I.2.6 ... )

Would you recommend this review?

yes

Other reviews under "Clustering":	Date

On the convergence of “A self-supervised vowel recognition system” Pathak A., Pal S. Pattern Recognition 20(2): 237-244, 1987. Type: Article	Aug 1 1988

Conceptual clustering of structured objects: a goal-oriented approach Stepp R., Michalski R. (ed) Artificial Intelligence 28(1): 43-69, 1986. Type: Article	Sep 1 1986

The enhanced LBG algorithm Patané G., Russo M. Neural Networks 14(9): 1219-1237, 2001. Type: Article	Apr 2 2003

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy