Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Sublinear algorithms for big data applications
Wang D., Han Z., Springer International Publishing, New York, NY, 2015. 85 pp. Type: Book (978-3-319204-47-5)
Date Reviewed: Aug 24 2016

“Big data” is the current buzzword pervading both the worlds of academia and industry. Between hype and distractions, research and innovation on big data proceeds at a very quick pace; in particular, the main subject of this book, the computational complexity of big data processing, is very hot and attracts both researchers and practitioners.

Wang and Han’s book focuses on sublinear algorithms for processing big data. There are different types of sublinear algorithms, including exact and approximate algorithms. (A textbook example of an exact sublinear algorithm is binary search.) The book focuses on sublinear algorithms of (1+&egr;, &dgr;)-approximation category, which, in their very essence, approximately solve a problem with guaranteed error bounds (that is, the difference between the exact and computed solutions is within 1+&egr;) in most cases (that is, the probability that the error is outside the prescribed bound is less than &dgr;). Within this category, sublinear algorithms in time use only a small portion of data (that is, o(N), with N being the amount of data) to approximately solve a problem; on the other hand, sublinear algorithms in space require a small amount of memory (but could scan all data). Finally, sublinear algorithms in communication are executed in several connected processing units, but require a small amount of communications to achieve their tasks.

Research on sublinear algorithms is not widespread because these algorithms were of theoretical interest only in recent years, when they were recognized as particularly attractive for big data processing. As a consequence of this recognition, scientific communications are spreading both in journals and conferences, and new classes have been included in academic courses. However, very few books are devoted to sublinear algorithms, especially for big data applications.

Given this premise, the book sounds potentially interesting. Also, its short format--in accordance with the “Springer Briefs” editorial policy--could attract those readers who are eager to grab the fundamentals of this topic without excessive technical and theoretical details. However, the book meets the objective suggested by its title only partially, thus leaving the reader possibly dissatisfied.

The book can be ideally divided into two parts: the first, quite short, is aimed at presenting the basics of sublinear algorithms, while the second part is devoted to the presentation of three main applications. The first part introduces some notable inequalities derived from probability theory, which constitute the basic tools for devising sublinear approximate algorithms. Then, after a basic example on the computation of the number of samples required to estimate a percentage within a fixed error bound, only two sublinear algorithms are presented. The first algorithm is aimed at estimating the number of distinct elements from a data stream using an amount of memory in the order of the logarithm of the stream cardinality. The second algorithm shows how two resources can be used to accomplish a certain task so that each resource undertakes a sublinear overhead. (This algorithm is presented as a metaphor, where resources are cats flung from a skyscraper in order to estimate the highest floor from which a thrown cat can survive.) Apart from the scanty number of presented algorithms, which is too few for a book with this title, the formalization of the algorithms is too rough, with many typographical errors, formal errors, and poorly presented proofs, thus making the comprehension of these algorithms almost impossible. (I only know how many cats I have killed in my mind trying to understand the correctness of the second algorithm!) What is more disappointing is the absence of a well-thought-out bibliography: only two references are given in this chapter, one of which is merely a link to a web page pointing to a number of PDFs with some surveys, where it is very hard to find the complete proofs that are only sketched in the book.

The second part of the book is devoted to the presentation of three applications of sublinear algorithms in problems involving big data. In the first application, a network of wireless sensors is self-organized in layers so that the sensors of a specific layer can be queried to provide aggregate information within certified error-bounds (up to a bounded probability). In the second application, the MapReduce paradigm (for distributed data processing) is refined to better comply with possible data skewness. In this case, a sublinear algorithm comes into play to distribute data in order to optimally balance the load among nodes in a computing cluster. Finally, the third application is about a sublinear algorithm to test two probability distributions, with the aim of detecting usage profiles in smart electrical grids. All three applications are surely interesting because they come from real-world problems; however, each chapter describes a specific solution proposed by the authors, instead of reporting a more comprehensive presentation of the most important alternatives available in current literature. In this sense, the content of the book appears to be very narrow, thus clashing with the broadness of its title.

The lesson learned by reading this book is that notable inequalities coming from probability theory have a mighty impact in devising really efficient algorithms that can be applied in contexts where big data is involved. For the researcher, this book also shows that there is room for improvement and new discoveries in this flourishing area. In this sense, however, the book leaves out some important contributions appearing in the literature and is not so comprehensive as to give a realistic snapshot of the current state of the art. Also, there is no trace of theoretical aspects, such as probably approximately correct (PAC) learning, which are tightly related to the book’s subject. The book is thus recommended mainly to researchers, but just as a piece of the bigger puzzle of sublinear algorithms for big data processing and applications.

Reviewer:  Corrado Mencar Review #: CR144709 (1611-0797)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Algorithms (I.5.3 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
General (H.3.0 )
 
 
Special-Purpose And Application-Based Systems (C.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Algorithms": Date
Monte Carlo comparison of six hierarchical clustering methods on random data
Jain N., Indrayan A., Goel L. Pattern Recognition 19(1): 95-99, 1986. Type: Article
Nov 1 1987
A parallel nonlinear mapping algorithm
Shen C., Lee R., Chin Y. International Journal of Pattern Recognition and Artificial Intelligence 1(1): 53-69, 1987. Type: Article
Jun 1 1988
Algorithms for clustering data
Jain A., Dubes R., Prentice-Hall, Inc., Upper Saddle River, NJ, 1988. Type: Book (9780130222787)
Jun 1 1989
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy