Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Effect of feature selection methods on machine learning classifiers for detecting email spams
Trivedi S., Dey S.  RACS 2013 (Proceedings of the 2013 Research in Adaptive and Convergent Systems, Montreal, Quebec, Canada, Oct 1-4, 2013)35-40.2013.Type:Proceedings
Date Reviewed: Jan 23 2014

Email has become ubiquitous as a fast and inexpensive form of communication. Algorithms to detect and filter unsolicited email (spam) continue to evolve to maintain their robustness in the face of evermore-sophisticated spammers. This paper evaluates the effectiveness of two different feature selection algorithms on the performance of four different classifiers for spam email detection.

The paper is well organized, but unfortunately many critical explanations are missing or unclear. There is no clear explanation of the fitness function for genetic search, and the greedy algorithm is not well explained. In addition, accuracies are computed but not put in the context of other research results. The authors also report false match rates for their own experiments, but not for previous researchers, weakening their conclusions. A proper hypothesis test would have involved running a Monte Carlo set of experiments to evaluate multiple feature set selections for each selection method. In this way, the average performance of each selection method could be computed for each of the classifiers, providing a stronger statistical basis for the conclusions.

Although this work is well intentioned, it is not clear what actual value it has for researchers in the fields of spam classifiers or feature selection, aside from a brief list of literature on spam detection. Feature subset selection is known to be nondeterministic polynomial-time (NP)-hard, and efficient approximation algorithms are generally application dependent. The authors have provided no insight into why one subset selection algorithm would be superior to the other for the given application. Given the limited level of detail, it is difficult to have confidence in the conclusions.

Reviewer:  Terry Riopka Review #: CR141922 (1405-0390)
Bookmark and Share
  Reviewer Selected
 
 
Design Methodology (I.5.2 )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Design Methodology": Date
Performance Evaluation of the Nearest Feature Line Method in Image Classification and Retrieval
Li S., Chan K., Wang C. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(11): 1335-1349, 2000. Type: Article
Jan 1 2002
Using a genetic algorithm and a perceptron for feature selection and supervised class learning in DNA microarray data
Karzynski M., Mateos Á., Herrero J., Dopazo J. Artificial Intelligence Review 20(1-2): 39-51, 2003. Type: Article
Nov 16 2004
On Using Partial Supervision for Text Categorization
Aggarwal C. (ed), Gates S., Yu P. (ed) IEEE Transactions on Knowledge and Data Engineering 16(2): 245-255, 2004. Type: Article
Apr 20 2005
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy