Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Distributed tuning of machine learning algorithms using MapReduce clusters
Ganjisaffar Y., Debeauvais T., Javanmardi S., Caruana R., Lopes C.  LDMTA 2011 (Proceedings of the 3rd Workshop on Large Scale Data Mining: Theory and Applications, San Diego, CA, Aug 21, 2011)1-8.2011.Type:Proceedings
Date Reviewed: Mar 30 2012

While machine learning algorithms have been around for a very long time, they invariably have a human component in the form of tuning--that is, finding the right values for parameters specific to the training set. Sometimes this can take a rather long time, because the tuning step needs to be repeated multiple times and each step takes a long time. MapReduce and cloud technologies--the power of distributed processing and the option of massively scaling the hardware using the cloud architecture--can make that step take less time.

This paper attempts to do exactly that by presenting some ideas on tuning machine learning algorithms by distributing the work using MapReduce. The authors consider two different machine learning tasks. The first task is related to the ranking of results; the authors consider the LambdaMART algorithm and the NDCG@k evaluation metric for this task. The second is a binary classification task related to detecting vandalistic edits in Wikipedia. The authors consider a roughly balanced random forest (RBRF) algorithm and area under the curve (AUC) evaluation metric for this task. These are two specific--but practical and important--contributions of this work.

The results indicate some progress--at least in the context of the specific evaluation metrics. Using MapReduce, the authors present ideas on how the tuning steps can be shortened, thereby saving practitioners countless hours.

However, the disconnect between the two problems and their corresponding algorithms and evaluation metrics is hard to miss. The only thread that ties them together is that they are both machine learning algorithms. In that sense, this paper is an amalgam of two mini-papers, and the thread that ties them together is rather weak. For example, it is unclear that the presented work would be useful for the same problem and the same algorithm, even if the evaluation metric were simply changed. At the least, no results are presented that suggest this.

If the two insights are simply that MapReduce is a good framework for distributing and harnessing the almost infinite computing power of the cloud and that machine learning algorithms are good candidates for using the MapReduce framework, then that much is accepted without any reservations. Any further general insights on this matter are not presented.

Reviewer:  Amrinder Arora Review #: CR140024 (1208-0841)
Bookmark and Share
  Reviewer Selected
 
 
Miscellaneous (H.4.m )
 
Would you recommend this review?
yes
no
Other reviews under "Miscellaneous": Date
Privacy through pseudonymity in user-adaptive systems
Kobsa A., Schreck J. ACM Transactions on Internet Technology 3(2): 149-183, 2003. Type: Article
Jun 12 2003
A coding scheme as a basis for the production of customized abstracts
Craven T. Journal of Information Science 13(1): 51-58, 1987. Type: Article
Mar 1 1988
Charting the unknown: how computer mapping at Harvard became GIS
Chrisman N., ESRI Press, 2006.  280, Type: Book (9781589481183)
Oct 18 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy