Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Best of 2016 Recommended by Editor Recommended by Reviewer Recommended by Reader
Search
Small summaries for big data
Cormode G., Yi K., Cambridge University Press, Cambridge, UK, 2020. 278 pp. Type: Book (978-1-108477-44-4)
Date Reviewed: Mar 29 2022

One of the dominant characteristics of the current computing ecology, the abundance of information in digital form, is a two-edged sword. On the one hand, it is increasingly likely that any information of interest exists somewhere online. On the other, the very mass of data being collected poses an increasing obstacle to searching, transmitting, processing, and exploiting it. This volume offers a concise handbook for using summaries of massive data to overcome this obstacle.

Chapter 1, the introduction, outlines the magnitude of the problem, surveying several domains that are generating such data. It then discusses a variety of data models that support summarization, including sets (recording which items are present or absent), multisets (capturing the multiplicity of each item), matrices, ordered data, geometric data, and graphs. It reviews the kinds of operations that a summary must support (initialize, update, merge, query, construct, and compress) and different processing models (streaming, parallel, and distributed). Since any summary loses some information, the chapter considers the kinds of guarantees that can be offered, and envisions in some detail a number of applications for data summarization. It concludes with an overview of the mathematical tools used in the subsequent chapters, with special attention to the Markov inequality and its use in the Chebyshev inequality and various forms of the Chernoff bound.

Each of the next six chapters offers several summarization techniques for a different kind of data (set, multiset, and so on). For each technique, the authors discuss the available operations, their complexity and available guarantees, history of development, and accessible implementations. Chapter 8 treats the special challenges posed by distributed information, chapter 9 considers some particular uses of summaries, and chapter 10 offers lower bounds supported by some summaries. Throughout, the book emphasizes high-level description of the methods with pseudocode, supported by clearly marked more formal discussion.

The schematic structure of the book, clearly explained in the introductory chapter, enables the reader to find the required information very quickly. A centralized bibliography lists 243 sources extending to 2019, and there is a short index.

This clear, easy-to-navigate volume will be of primary interest to software developers who need a quick guide to available techniques. However, the formal discussions, while marked “Further Discussion” so as not to distract the practitioner, provide additional depth for readers who want to go deeper, making the book useful to the student as well. The book is an excellent example of a focused tool addressing a specific but very pressing problem, and will become an essential handbook as our digital world becomes more and more congested.

Reviewer:  H. Van Dyke Parunak Review #: CR147423
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Data Storage Representations (E.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Storage Representations": Date
An efficient representation for sparse sets
Briggs P., Torczon L. ACM Letters on Programming Languages and Systems 2(1-4): 59-69, 1993. Type: Article
Dec 1 1994
 Adaptive data structures for IP lookups
Ioannidis I., Grama A., Atallah M. Journal of Experimental Algorithmics 10(es): 1.1-es, 2005. Type: Article
Jan 13 2006
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Andoni A., Indyk P. Communications of the ACM 51(1): 117-122, 2008. Type: Article
Oct 15 2009
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy