Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
The impact of semi-supervised clustering on text classification
Kyriakopoulou A., Kalamboukis T.  PCI 2013 (Proceedings of the 17th Panhellenic Conference on Informatics, Thessaloniki, Greece, Sep 19-21, 2013)180-187.2013.Type:Proceedings
Date Reviewed: Apr 21 2014

Categorizing texts into predefined classes is an important issue relevant to many fields, including data mining, natural language processing, and machine learning. In this paper, the authors build on previous work that investigates how building new features by means of clustering can boost classification performance [1].

The authors attempt to make two contributions: taking the entire test set into account during the clustering step, and relaxing the one class leads to one cluster constraint. The first is time consuming and prone to overfitting, which is observed during the experiments on three classical datasets. The authors seem to have ignored a huge part of the literature related to the second. In particular, a sustained effort has been undertaken in the topic modeling field. Latent Dirichlet allocation (LDA)-based dimensionality reduction shows an improvement for the task of text categorization [2], which is confirmed by the authors of this paper. Some recent work builds on a semi-supervised version of LDA for the task of regression and classification [3].

In conclusion, the contribution here is marginal, as the authors give little novel insight into solutions for the task of text categorization.

Reviewer:  Julien Velcin Review #: CR142196 (1407-0590)
1) Kyriakopoulou, A.; Kalamboukis, T. Using clustering to enhance text classification. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007, 805-806.
2) Blei, D. M.; Ng, A. Y.; Jordan, M. I. Latent Dirichlet allocation. The Journal of Machine Learning Research 3, (2003), 993–1022.
3) Zhu, J.; Ahmed, A.; Xing, E. P. MedLDA: maximum margin supervised topic models for regression and classification. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009, 1257-1264.
Bookmark and Share
 
Clustering (I.5.3 )
 
 
Induction (I.2.6 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Clustering": Date
On the convergence of “A self-supervised vowel recognition system”
Pathak A., Pal S. Pattern Recognition 20(2): 237-244, 1987. Type: Article
Aug 1 1988
Conceptual clustering of structured objects: a goal-oriented approach
Stepp R., Michalski R. (ed) Artificial Intelligence 28(1): 43-69, 1986. Type: Article
Sep 1 1986
The enhanced LBG algorithm
Patané G., Russo M. Neural Networks 14(9): 1219-1237, 2001. Type: Article
Apr 2 2003
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy