Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Big data of complex networks
Dehmer M., Emmert-Streib F., Pickl S., Holzinger A., Chapman & Hall/CRC, Boca Raton, FL, 2016. 332 pp. Type: Book (978-1-498723-61-9)
Date Reviewed: Sep 26 2017
Comparative Review

The rapid growth of online information systems, the ease of collecting data across many users, and the potential commercial value of learning about those users have led to growing interest in methods that address data characterized by high volume (there is a lot of it, often terabytes per day), high velocity (it’s changing rapidly), and high variety (it’s not homogeneous). Some authors extend the list with high value (otherwise we wouldn’t bother with it), and variable veracity (requiring sorting good from bad). The first three Vs in particular frustrate traditional methods of collecting, analyzing, and visualizing such data, stimulating a burst of research interest, conferences, and publications. This review considers three recent works in this area.

Bahga and Madisetti, in Big data science and analytics, offer an integrated, application-oriented synthesis of the field. An introductory chapter defines “big data,” gives a range of domain examples, outlines the overall data flow (collection, preprocessing, analysis, and visualization), and describes the classes of tools that address the steps in that flow (the big data stack). Chapter 2 describes four current examples of this stack, chapter 3 outlines a series of design patterns that are useful in dealing with big data, and chapter 4 reviews four NoSQL databases, increasingly popular because of the “variety” facet of big data.

The next six chapters walk the reader through specific tools for each layer of the stack. Chapter 11 reviews analytical algorithms, and chapter 12 surveys a range of chart and plot types. Throughout, the text is peppered with case studies that illustrate the use of the tools being discussed.

This volume will be useful to an IT professional who wants a well-organized guide to current tools and frameworks. Unfortunately, in a marketplace that is changing so rapidly, one expects that the systems it discusses will be obsolete, or at least greatly modified, in one or two years. In addition, it offers almost no insight into the range of current research addressing big data.

The next two volumes are more focused on the underlying research and technology. Both are edited volumes, with very little systematic organization guiding the selection or organization of the chapters. Thus, while filling the gap in Bahga and Madisetti by introducing the reader to deeper technical issues, they do not do nearly as good a job in providing a synthesis of the field.

Pyne, Rao, and Rao, in Big data analytics, offer 14 studies by recognized researchers in big data. These individual authors do a good job of illustrating the techniques they expound with examples of practical applications, including manufacturing, cyber-security, health records, microbiome analysis, neuroscience, and cancer research. The chapters are of uneven emphasis, with some providing little more than a high-level survey while others explain a particular technique in great detail.

Nehmer, Emmert-Streib, Pickel, and Holziger, in Big data of complex networks, offer 12 papers by various contributors on analytics for big data. The book’s title attempts to set it apart by combining the popular emphasis on “big data” with another topic that is in vogue, “complex networks.” The title offers wonderful promise, reinforced by the statement in the preface that the book is “dedicated exclusively to big data network analysis.” Most analytic tasks for graph-structured data are NP-complete, and conventional methods begin to fail for networks beyond a few hundred nodes. And some contributions to this book do address this challenge, including chapters 4 and 12 on visualization, chapter 5 on finding dominating sets in large graphs, chapter 7 addressing analysis of large matrices, and the description in chapter 10 of the ScaleGraph analytics system, based on the X10 programming language for parallel computing. But the other seven chapters are more loosely related to the book’s declared purpose. They invoke the idea of networks in different ways, including communication networks as the origin of the data (chapters 6), using computer networks to process big data (chapters 2 and 3), biological networks as the object of study (chapters 1 and 9), and the legal and policy implications of the information drawn from online networks (chapter 8). In some cases, it is arguable whether the data involved is even big: chapter 11 discusses the use of the graph robustness measure R (not to be confused with the popular computing package) on example graphs of fewer than 5,000 nodes. Specialized researchers may be interested in one or another of the chapters in this volume that happen to deal with their particular research, but it will disappoint readers who are seeking a good survey of the title topic.

The usual publication history of a new technical field follows a clear trajectory. It starts with isolated papers or conference presentations describing seminal research or application projects. Those attracted to the theme begin to join together for workshops, leading to edited volumes of collected papers like Pyne et al. and Nehmer et al. As practitioners gain consensus on the shape of the new field, integrated textbooks appear that pull the research threads together. Bahga and Madisetti offer a promising structure for such a volume, but there is still room for an organized summary of the field that goes beyond a review of existing software tools to pull together the various strands of research into a coherent summary of big data.

Reviewer:  H. Van Dyke Parunak Review #: CR145563 (1712-0777)
Comparative Review
This review compares the following items:
  • Big data of complex networks:
  • Big data science & analytics:a hands-on approach
  • Big data analytics:methods and applications
  • Bookmark and Share
      Featured Reviewer  
     
    Content Analysis And Indexing (H.3.1 )
     
     
    Database Applications (H.2.8 )
     
     
    Clustering (H.3.3 ... )
     
     
    Data Mining (H.2.8 ... )
     
     
    Content Analysis And Indexing (H.3.1 )
     
     
    Database Applications (H.2.8 )
     
      more  
    Would you recommend this review?
    yes
    no
    Other reviews under "Database Applications": Date
    Databases for genetic services: current usages and future directions
    Meaney F. Journal of Medical Systems 11(2-3): 227-232, 1987. Type: Article
    Sep 1 1988
    Database applications using Prolog
    Lucas R., Halsted Press, New York, NY, 1988. Type: Book (9789780470211663)
    Aug 1 1990
    Oracle’s cooperative development environment
    Kline K., Butterworth-Heinemann, Newton, MA, 1995. Type: Book (9780750695008)
    May 1 1996
    more...

    E-Mail This Printer-Friendly
    Send Your Comments
    Contact Us
    Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
    Terms of Use
    | Privacy Policy