Computing Reviews

Data intensive distributed computing :challenges and solutions for large-scale information management
Kosar T., Information Science Reference - Imprint of: IGI Publishing,Hershey, PA,2012. 350 pp.Type:Book
Date Reviewed: 07/29/14

With many books on the market already addressing the large spectrum of big data-related topics, Data intensive distributed computing manages to address the underlying research and technology issues from an original and new viewpoint. This book is by several groups of contributors, with many of the authors coming from academic or research backgrounds. Therefore, the topics addressed are highly relevant to researchers working on data-intensive computing or projects related to developing more efficient data-intensive platforms.

The book is structured in four major sections. Section 1 (chapters 1 to 3) introduces several computing paradigms for data-aware infrastructures. These paradigms address the design of new scheduling, throughput optimization, and workflow management for data-aware computing. The described solutions include efficient data placement algorithms and dedicated data-aware scheduling.

Section 2 specifically targets the distributed storage systems for data-intensive computation. This section includes three chapters that range from a general introduction to the requirements for large data storage systems, to concrete descriptions of scalable, fault-tolerant, and bandwidth-optimized storage systems.

Aside from adequate storage systems, data-intensive computation also requires the management of computation workflows. Section 3 (chapter 7 to 9) addresses the casting of workflows in terms of several types of optimization problems that take into account the data, timing constraints, and network bandwidth. I did particularly appreciate chapter 9, where the replica management problem is described to alleviate the stringent requirements for both the performance and the reliability of modern scientific computing.

Section 4 is all about applications. Current applications of data-intensive computing are relevant to researchers in bioinformatics, data visualization, and heterogeneous resource sharing for large-scale simulations. These three topics are respectively addressed in chapters 10 through 12.

It’s difficult to summarize a review of a book written by more than 12 groups of authors, but for researchers in the data-intensive computing area, this book is a unified reference and introduction to some of the most relevant research approaches. I do miss a more pragmatic and hands-on introduction to the available implementations and usage scenarios, but the fundamental results and background information on many of the inherent data-intensive architectures make it recommended reading for academic researchers and graduate-level students.

Reviewer:  Radu State Review #: CR142563 (1411-0910)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy