ComputingReviews.com

A hierarchical neural network document classifier with linguistic feature selection
Chen C., Lee H., Hwang C. Applied Intelligence23(3):277-294,2005.Type:Article

Date Reviewed: 08/02/06

During the last decade, automatic text classification (or categorization), the activity of labeling natural language texts with thematic categories [1], has been a very active research field. Various techniques have been developed to efficiently categorize large numbers of documents in predefined sets of thematic-oriented categories. Although classification algorithms are more and more efficient, the classification of documents remains a challenge, and is still an active research field, mostly because it is a content-dependent task that involves linguistic and semantic considerations. For instance, in text classification, feature selection (the task of identifying the most discriminant and semantically relevant words to distinguish documents) is certainly one of the most challenging processes.

This paper focuses on two important aspects of text classification: the feature selection task and the classification system. The authors present a feature selection methodology based on conformity and uniformity measures of words. These measures aim at evaluating term significance. Conformity indicates how word frequencies are distributed among categories. On the other hand, uniformity indicates how words that are relevant to a specific category are distributed among all the documents belonging to a specific category. According to these measures, words relevant to a category should be mostly present in the documents of that category (conformity), and at the same time uniformly present in each of those documents (uniformity).

The hierarchical classification system proposed in this study is composed of back-propagation classifiers (three-layer feed-forward neural network structure). In this system, each classifier is in charge of classifying the categories that succeed from a single parent category (page 284). This system was validated on two sets of testing documents (composed, respectively, of 500 and 3,000 documents). The results are compared with success with hierarchical vector space model (VSM) and hierarchical K nearest neighbors classifiers. This paper is an interesting and original contribution to the field of text classification.

Sebastiani, F. Machine learning in automated text categorization. ACM Computing Surveys 34, 1(2002), 1–47.

Reviewer: Dominic Forest

Review #: CR133126

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy