Computing Reviews

Joe Celko’s complete guide to NoSQL :what every SQL professional needs to know about non-relational databases
Celko J., Morgan Kaufmann Publishers Inc.,San Francisco, CA,2013. 244 pp.Type:Book
Date Reviewed: 03/04/14

Most contemporary database professionals have been educated on relational models and products; however, most lack experience with legacy data systems, and many have little experience with the latest non-relational approaches.

In this book, the author tries to address these gaps. With new non-tabular data, there is a clear need for complementary views on data management. The book addresses non-relational data organizations in an eclectic sense, commonly known as NoSQL (“not only SQL”).

The book is organized into 13 short chapters covering a number of heterogeneous NoSQL topics without concentrating on a specific NoSQL product. While the book can hardly be considered a complete guide to NoSQL, since it is neither comprehensive nor lengthy, it does offer useful information on non-relational data management issues. Hence, the content is not sufficient to provide a deep understanding of the new mindset, but the references provided at the end of each chapter supply links for further study.

Chapter 1 recounts the development of the transactional world with a review of batch transaction processing; traditional issues of concurrency control; and atomicity, consistency, isolation, and durability (ACID). The author notes that traditional SQL relational database management systems (RDBMS) are not appropriate for every situation. In chapter 2, Celko introduces columnar databases, which operate on columns rather than rows like the relational model. In the third chapter, the author moves on to graph databases, with a discussion of their lack of standards. He describes a few graph languages, such as SPARQL, SPASQL, Gremlin, and Cypher. Chapter 4 discusses the MapReduce model, a widely used approach proposed by Google and Yahoo! for their big data implementations.

Chapter 5 explores other aspects of NoSQL, including streaming databases, complex events, optimistic concurrency, and complex event processing. The chapter mentions a few commercial stream-oriented products, such as StreamBase and Kx, but notes that no general standard exists in this area. Chapter 6 focuses on key-value stores with an informal discussion on handling keys and values. The chapter also discusses a number of products that emerged in 2013.

Chapter 7 features textbases, making the case that a lot of information is contained in unstructured text as opposed to structured data. The author introduces text mining and the issue of syntax versus semantics. The chapter ends with a reminder that meaning can be linked to domain-specific vocabulary, as exemplified in legal applications that use LexisNexis, or a medical domain implemented in the IBM Watson AI computing system.

Chapter 8 covers geographical data, illustrated with examples of postal code data from the US, Canada, and Great Britain. In chapter 9, the author presents the inevitable discussion of big data and cloud computing.

Chapter 10 changes direction completely with the introduction of biometrics, fingerprints, and specialized DNA databases. In chapter 11, the author transitions to analytic databases with an overview of the different online analytical processing (OLAP) approaches. Chapter 12 is a brief overview of multi-valued and non-first normal form (NFNF) databases, a somewhat obscure topic for the traditional relational database practitioner.

The book concludes with a reminder that a lot of data is still stored in legacy systems such as IBM information management systems, integrated database management systems (IDMS), or other pre-relational navigational technologies. While the topic has been dropped from most database management textbooks, the real world demonstrates that these implementations have been very resilient and continue to hold their place in the data management landscape.

The book summarizes various NoSQL topics to acquaint readers with both old and new data management issues outside the realm of the relational framework. The content would be more accessible with better organization, but the book is very timely. Data management novices might find it difficult to find the relevant nuggets here, but an open-minded seasoned data professional should be able to extrapolate the new material from the heterogeneous coverage.

Aside from weak editing and some associated remaining errors, the book evinces the author’s experience and knowledge. I found it thought provoking and believe that it has a place on the data manager’s bookshelf.

More reviews about this item: Amazon, Goodreads, i-Programmer

Reviewer:  Jean-Pierre Kuilboer Review #: CR142059 (1406-0406)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy