Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
An improved data characterization method and its application in classification algorithm recommendation
Wang G., Song Q., Zhu X. Applied Intelligence43 (4):892-912,2015.Type:Article
Date Reviewed: Jan 20 2016

Classification is an active research problem, and numerous classification algorithms have been proposed over the past few years. Some algorithms perform better than others, based on the dataset. The “no silver bullet” or “no free lunch” theorem is an informal theorem that states that no single classification algorithm outperforms other classification algorithms on all datasets. This informal theorem is essentially what keeps many data scientists in business--each dataset has its own idiosyncrasies, and different classification algorithms need to be explored to find the one that best meets the needs of the problem at hand. Many practitioners also choose ensemble models, which are mechanisms to combine multiple classification algorithms.

This interesting work attempts to automate, or at least speed up, the process of finding the best classification algorithm for a given dataset. The proposed method is based on the premise that if a classification algorithm performs well on a given dataset, then it will also perform well on other similar datasets. The proposed method operates by: (1) clustering historical datasets; (2) identifying applicable classifiers for each cluster; (3) identifying the nearest cluster for the given dataset; and (4) evaluating the recommended classifiers for the cluster. Obviously, steps 1 and 2 only need to be performed once as a setup activity and steps 3 and 4 can be performed for the given dataset.

The authors present experimental results using 17 different classification algorithms over 84 public UCI datasets. The classification algorithms considered are quite varied, and span probability based, tree based, rule based, instance based, support vector based, as well as ensemble algorithms. The 84 datasets are also quite diverse, spanning categories such as diseases, cars, molecular biology, zoology, languages, and physics. The results show generally high hit rates--under certain configurations, one of the top three recommended classifiers is the optimal classifier for the given dataset in more than 90 percent of cases. These detailed experimental results are one of the high points of the proposed work.

It has not escaped my attention that this work, which begins by stating the “no free lunch” theorem, very possibly contradicts the same theorem. After all, an “ultimate” meta classification algorithm could be defined as one that first uses Wang et al.’s method to find the recommended classification algorithms, tests the top recommended algorithms, and then proceeds to perform the actual classification.

Reviewer:  Amrinder Arora Review #: CR144115 (1605-0354)
Bookmark and Share
 
Algorithms (I.5.3 ... )
 
 
Classifier Design And Evaluation (I.5.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Algorithms": Date
Monte Carlo comparison of six hierarchical clustering methods on random data
Jain N., Indrayan A., Goel L. Pattern Recognition 19(1): 95-99, 1986. Type: Article
Nov 1 1987
A parallel nonlinear mapping algorithm
Shen C., Lee R., Chin Y. International Journal of Pattern Recognition and Artificial Intelligence 1(1): 53-69, 1987. Type: Article
Jun 1 1988
Algorithms for clustering data
Jain A., Dubes R., Prentice-Hall, Inc., Upper Saddle River, NJ, 1988. Type: Book (9780130222787)
Jun 1 1989
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy