Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Data architecture : from Zen to reality
Tupper C., Morgan Kaufmann Publishers Inc., San Francisco, CA, 2011. 448 pp. Type: Book (978-0-123851-26-0)
Date Reviewed: Feb 10 2012

Data--customer data, operating data, and historical data--is essential to any modern organization. For organizations that recognize the importance of data, gathering and storing such data can be a significant effort. For organizations that do not recognize its importance, and thus do not properly manage it, the effort is often higher than it should be.

This book discusses the various important roles that data plays; the things that can go wrong; and the things that the various people involved need to understand when planning and managing a data management system. Such systems involve far more than the actual information stored, and the components are as disparate as the physical storage used, ranging from how (and if) the data is distributed over multiple databases to the people who are managing it, using it, and developing the software that uses it. Toss in factors such as requirements for secure recoverable storage facilities, the potential for corruption, and reporting systems, and the extent of the problem becomes even more evident.

Though these things are often clear to database administrators, software developers, and financial officers (who need the data for tax authorities, accountants, and investors), they are not always so clear to other management positions. In particular, a smaller organization may not appreciate how valuable its data might be, and growth brings even more problems with transitioning data, often from ad hoc databases.

There are 22 chapters in five main sections:

  • “The Principles”--data you might be concerned with, why it is valuable, and how it is often stored and used;
  • “The Problem”--how data stores often evolve as a business changes, productivity, and solutions that cause problems (a particularly interesting chapter);
  • “The Process”--how data is organized and architected, data models, and temporal information as a component of data;
  • “The Product”--practical information on topics such as indexing, keys, and physical aspects of databases;
  • and “Specialized Databases”--data warehouses, object-relational databases, and distributed databases.

The book seems targeted at those with management responsibilities and intended to convince them of the value of properly managing their data. Certainly, much of the content will already be familiar to those who work with databases. On the other hand, will a manager tasked with interacting with a data store at a high level really care about normalization and the kinds of keys stored in various records?

The author makes a number of important points, and much of the information presented here should, in some Panglossian, best-of-all-possible database worlds, be known to all managers involved in data stores, however indirectly, and to almost all higher levels of management.

There are a number of issues that make the book less readable than it should be. If these issues were addressed and the book focused more on its target audience, it would be rather more useful.

Among other issues, the varying levels of detail offered may not be particularly helpful. In one chapter, very high-level information is offered; in another, we find details (but no helpful detailed examples) of how a data warehouse schema might differ from a schema in a transactional database. A number of terms and acronyms are used but not indexed, and there is no glossary. I found myself bewildered at more than one point, and had to resort to paging back and forth trying to find the meaning of an acronym (in one case, the definition appeared several pages after it was used). One still stumps me: a caption to a diagram consisting of two arrows pointing at each other reads, “This excludes GH parent-child relationships.” GH parent-child relationships? Even Google was no help.

A few of the diagrams are opaque, even in context and with captions. For instance, one titled “Resolve Recursive Relationships” consists of a box with a curvy handle-shaped object with an arrowhead pointing back to the box. This diagram is all the more confusing because of a note in the caption beneath it: “History is not a valid use for recursion.” What is this image trying to tell me?

The language used could also use some work. One sentence, while not typical of the whole text, is not entirely unrepresentative:

This is different from the data defined in the data gathering and classification and business area data modeling in that current data inventory analysis has to do with what data are used currently rather than what data will be needed to support the business area in the future.

It all (eventually) parses and makes sense, but doesn’t offer much in the way of information.

Reviewer:  Jeffrey Putnam Review #: CR139849 (1206-0559)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Data Storage Representations (E.2 )
 
 
Data Structures (E.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Storage Representations": Date
An efficient representation for sparse sets
Briggs P., Torczon L. ACM Letters on Programming Languages and Systems 2(1-4): 59-69, 1993. Type: Article
Dec 1 1994
 Adaptive data structures for IP lookups
Ioannidis I., Grama A., Atallah M. Journal of Experimental Algorithmics 10(es): 1.1-es, 2005. Type: Article
Jan 13 2006
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Andoni A., Indyk P. Communications of the ACM 51(1): 117-122, 2008. Type: Article
Oct 15 2009
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy