Computing Reviews

Distributed tuning of machine learning algorithms using MapReduce clusters
Ganjisaffar Y., Debeauvais T., Javanmardi S., Caruana R., Lopes C.  LDMTA 2011 (Proceedings of the 3rd Workshop on Large Scale Data Mining: Theory and Applications, San Diego, CA, Aug 21, 2011)1-8,2011.Type:Proceedings
Date Reviewed: 03/30/12

While machine learning algorithms have been around for a very long time, they invariably have a human component in the form of tuning--that is, finding the right values for parameters specific to the training set. Sometimes this can take a rather long time, because the tuning step needs to be repeated multiple times and each step takes a long time. MapReduce and cloud technologies--the power of distributed processing and the option of massively scaling the hardware using the cloud architecture--can make that step take less time.

This paper attempts to do exactly that by presenting some ideas on tuning machine learning algorithms by distributing the work using MapReduce. The authors consider two different machine learning tasks. The first task is related to the ranking of results; the authors consider the LambdaMART algorithm and the NDCG@k evaluation metric for this task. The second is a binary classification task related to detecting vandalistic edits in Wikipedia. The authors consider a roughly balanced random forest (RBRF) algorithm and area under the curve (AUC) evaluation metric for this task. These are two specific--but practical and important--contributions of this work.

The results indicate some progress--at least in the context of the specific evaluation metrics. Using MapReduce, the authors present ideas on how the tuning steps can be shortened, thereby saving practitioners countless hours.

However, the disconnect between the two problems and their corresponding algorithms and evaluation metrics is hard to miss. The only thread that ties them together is that they are both machine learning algorithms. In that sense, this paper is an amalgam of two mini-papers, and the thread that ties them together is rather weak. For example, it is unclear that the presented work would be useful for the same problem and the same algorithm, even if the evaluation metric were simply changed. At the least, no results are presented that suggest this.

If the two insights are simply that MapReduce is a good framework for distributing and harnessing the almost infinite computing power of the cloud and that machine learning algorithms are good candidates for using the MapReduce framework, then that much is accepted without any reservations. Any further general insights on this matter are not presented.

Reviewer:  Amrinder Arora Review #: CR140024 (1208-0841)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy