Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A hierarchical neural network document classifier with linguistic feature selection
Chen C., Lee H., Hwang C. Applied Intelligence23 (3):277-294,2005.Type:Article
Date Reviewed: Aug 2 2006

During the last decade, automatic text classification (or categorization), the activity of labeling natural language texts with thematic categories [1], has been a very active research field. Various techniques have been developed to efficiently categorize large numbers of documents in predefined sets of thematic-oriented categories. Although classification algorithms are more and more efficient, the classification of documents remains a challenge, and is still an active research field, mostly because it is a content-dependent task that involves linguistic and semantic considerations. For instance, in text classification, feature selection (the task of identifying the most discriminant and semantically relevant words to distinguish documents) is certainly one of the most challenging processes.

This paper focuses on two important aspects of text classification: the feature selection task and the classification system. The authors present a feature selection methodology based on conformity and uniformity measures of words. These measures aim at evaluating term significance. Conformity indicates how word frequencies are distributed among categories. On the other hand, uniformity indicates how words that are relevant to a specific category are distributed among all the documents belonging to a specific category. According to these measures, words relevant to a category should be mostly present in the documents of that category (conformity), and at the same time uniformly present in each of those documents (uniformity).

The hierarchical classification system proposed in this study is composed of back-propagation classifiers (three-layer feed-forward neural network structure). In this system, each classifier is in charge of classifying the categories that succeed from a single parent category (page 284). This system was validated on two sets of testing documents (composed, respectively, of 500 and 3,000 documents). The results are compared with success with hierarchical vector space model (VSM) and hierarchical K nearest neighbors classifiers. This paper is an interesting and original contribution to the field of text classification.

Reviewer:  Dominic Forest Review #: CR133126
1) Sebastiani, F. Machine learning in automated text categorization. ACM Computing Surveys 34, 1(2002), 1–47.
Bookmark and Share
 
Document Analysis (I.7.5 ... )
 
 
Connectionism And Neural Nets (I.2.6 ... )
 
 
Design Methodology (I.5.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Document Analysis": Date
Generating indicative-informative summaries with sumUM: a 3D dynamic virtual shop
Saggion H., Lapalme G. Computational Linguistics 28(4): 497-526, 2002. Type: Article
Jun 20 2003
Parameter-Free Geometric Document Layout Analysis
Lee S., Ryu D. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(11): 1240-1256, 2001. Type: Article
Jul 26 2002
Digital document processing: major directions and recent advances (Advances in Pattern Recognition)
Chaudhuri B., Springer-Verlag New York, Inc., Secaucus, NJ, 2006.  468, Type: Book (9781846285011)
Aug 13 2007
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy