Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
On Using Partial Supervision for Text Categorization
Aggarwal C. (ed), Gates S., Yu P. (ed) IEEE Transactions on Knowledge and Data Engineering16 (2):245-255,2004.Type:Article
Date Reviewed: Apr 20 2005

The automatic classification and categorization of textual documents has become a very active research field since the 1990s. One of the main reasons for this is the considerable amount of digital information available. Search engine users now expect to obtain results through effective content-based technologies; many experiments have demonstrated that automatic classification and categorization of text data can contribute to the achievement of this objective.

Most classification techniques are effective for classifying documents into very general categories (such as the higher levels of the Yahoo! categories), but they have proven to be mostly inaccurate in distinguishing fine-grained, related categories. This paper presents an innovative method that automatically categorizes documents using a supervised clustering process. First, the method uses a preexisting taxonomy to supervise, through an automatic classification process, the creation of closely related clusters. The objective is then to identify specific words that best describe the matter of each cluster. Documents are then automatically categorized into the taxonomy, including the closely related clusters.

The method was compared to a manual categorization process using the Yahoo! taxonomy. The results indicate that categorization using this novel method is generally just as good as the manual Yahoo! categorization. The main advantage of this method is that it is completely automatic (while the Yahoo! categorization is entirely manual).

The main contributions of this paper are its validation of the effectiveness of using a clustering process to build a fine-grained, related set of categories, and its demonstration that supervised clustering can be an interesting alternative to traditional categorization, using a predefined set of categories.

Reviewer:  Dominic Forest Review #: CR131157 (0510-1174)
Bookmark and Share
  Reviewer Selected
 
 
Design Methodology (I.5.2 )
 
 
Text Analysis (I.2.7 ... )
 
 
Clustering (I.5.3 )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Design Methodology": Date
Performance Evaluation of the Nearest Feature Line Method in Image Classification and Retrieval
Li S., Chan K., Wang C. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(11): 1335-1349, 2000. Type: Article
Jan 1 2002
Using a genetic algorithm and a perceptron for feature selection and supervised class learning in DNA microarray data
Karzynski M., Mateos Á., Herrero J., Dopazo J. Artificial Intelligence Review 20(1-2): 39-51, 2003. Type: Article
Nov 16 2004
Structural hidden Markov models using a relation of equivalence: application to automotive designs
Bouchaffra D., Tan J. Data Mining and Knowledge Discovery 12(1): 79-96, 2006. Type: Article
Aug 21 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy