Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Towards UCI+: a mindful repository design
Macià N., Bernadó-Mansilla E. Information Sciences261 237-262,2014.Type:Article
Date Reviewed: Sep 12 2014

The evaluation of machine learning algorithms has always been controversial. What datasets, experimental setting, and statistical tests must be chosen? Dataset repositories such as UCI have been enormously handy for machine learning research. However, the selection of datasets is commonly careless (when not cherry-picked).

This paper is not the first criticism about the way these datasets are used, but it is the most comprehensive, insightful, and constructive so far. The use of complexity measures and a characterization of the UCI datasets are most welcome. Nonetheless, the analysis could have been more complete with complexity measures derived from (approximations of) Kolmogorov complexity and with other performance metrics (only accuracy is used).

Despite the implausibility of the assumptions of the no-free-lunch theorem, it pervades the authors’ notion of diversity. More diversity does not mean that problems should cover all ranges of error and complexity measures in a uniform way. More effort should be done to clarify what a “representative” sample of real problems is if we want to assess whether the UCI repository is diverse and “challenging” enough.

The authors also present a basic dataset generator based on injecting distortion (a pattern-based generator would possibly be a better option). Anyway, the use of dataset generators jointly with a more regulated and automated evaluation procedure is the way to go.

This paper should not only contribute to a debate in the community, but it should also become a must-read for everyone using the UCI datasets for the evaluation of machine learning algorithms.

Reviewer:  Jose Hernandez-Orallo Review #: CR142712 (1412-1070)
Bookmark and Share
  Featured Reviewer  
 
Data Warehouse And Repository (H.2.7 ... )
 
 
Learning (I.2.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Warehouse And Repository": Date
The IBM data warehouse architecture
Bontempo C., Zagelow G. Communications of the ACM 41(9): 38-48, 1998. Type: Article
Jan 1 1999
Building the data warehouse
Gardner S. Communications of the ACM 41(9): 52-60, 1998. Type: Article
May 1 1999
Interactive data warehousing
Singh H., Prentice Hall PTR, Upper Saddle River, NJ, 1998. Type: Book (9780130803719)
Aug 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy