Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Making databases work : the pragmatic wisdom of Michael Stonebraker
Brodie M. Association for Computing Machinery and Morgan & Claypool, New York, NY,2019.Type:Divisible Book
Date Reviewed: Jun 10 2020

Michael Stonebraker, in 2014, won the ACM A. M. Turing Award “for fundamental contributions to the concepts and practices underlying modern database systems.” From the back cover: “The book describes, for the broad computing community, the unique nature, significance, and impact of Mike’s achievements in advancing modern database systems over more than 40 years.” The 36 very readable stories in the book are told from research computer systems perspectives, and, similarly to The Ingres papers [1] published more than 30 years ago, the 39 contributors (Stonebraker and his collaborators--database researchers, systems engineers, and business partners) also discuss what failed and why. The book covers important and diverse aspects of database management system (DBMS) design and development, including software and systems engineering issues associated with large projects.

Editor Michael Brodie correctly observes in his introduction that Stonebraker’s “initial contributions, Ingres and Postgres, did more to make relational databases work in practice than those of any other individual,” and that “a hallmark of [Stonebraker’s] career is to perpetually question conventional wisdom, even of his own making.” This introduction presents an excellent roadmap of the volume and is a very interesting story in itself. It is clear that Stonebraker’s approach has always been to target real-life problems rather than more theoretical research, often using creative compositions of appropriately chosen existing components. For an important example, in Postgres, such components as abstract user-defined datatypes, a clean rules system, and a time-travel (no overwrite) storage system were not new, but were successfully composed into an outstanding product. Overall, Stonebraker’s teams succeeded in “ruthlessly avoiding complexity while navigating a huge design space.”

In a very explicit interview with Marianne Winslett (Chapter 2), as well as elsewhere in the book, Stonebraker observes that in order to make a difference, “you’ve got to find a real-world problem [and] find an enterprise that actually wants your problem solved,” and “if you’re in enough pain, then you’ll try new ideas.” This sounds very familiar, reminding me of a saying I heard in some large companies: “a project has to fail at least twice, and then they will listen to you.” Stonebraker further emphasizes that both Ingres and Postgres (as well as all other projects) were developed with one full-time person as a chief programmer and “three or four or five graduate students, no postdocs” in a few years. The huge positive impact of open-source software on research and practice is demonstrated by Ingres--the first database system for which the prototype was released in source-code form (“and hence provided the nascent database community the first example of a working DBMS that academic researchers could study and modify”)--and by Postgres, which “to this day remains one of the most popular open-source database systems,” supporting functions “that are often missing from commercial products” (Chapter 6, David J. DeWitt). Since “disruptive ideas do not usually find a receptive audience among the established vendors” [see 2], launching a startup to prove the technical superiority of one’s ideas appears to be the preferred option; this led to nine startups cofounded by the serial entrepreneur Stonebraker. The powerful yet simple and usable products of these startups “actually could sell successfully against the Elephants [the big three database companies] ... [thus] helping to unlock the entire industry” (Chapter 4, James Hamilton); in particular, these products include commercial versions of Ingres and Postgres. An important lesson is encountered time and again: “Work with customers early and often. Listen carefully. Don’t be constrained by conventional wisdom.” At the same time, Stonebraker and others observe that the best technology does not always win: strong sales and marketing are of considerable importance.

Some stories discuss “how to create and run” a Stonebraker-like startup, including a venture capitalist’s perspective. Of note, as in business modeling, it is necessary to articulate the essentials and tacit assumptions; a prototype, especially one shown to a nonspecialist audience, “should be simple and crisp and take no more than five minutes to demonstrate the idea” (Stonebraker). This approach also sounds very familiar and reminds me of demonstrating the skeleton of an abstract and precise financial business domain specification to an audience of traders (alas, this took 15 minutes but was still successful). Throughout the book, Stonebraker’s emphasis that “good ideas are invariably simple” can be clearly seen.

“Where We Have Failed” (Chapter 11, Stonebraker) is the most important story in the book. Here, the author observes such serious problems as our expanding field (not only and perhaps even not mainly business data processing), ignoring real customers (and therefore reinventing various, possibly square, wheels), and “paper deluge” (especially publishing “least publishable units”). The latter has been observed by many, notably E. W. Dijkstra in the 1970s, who wrote about write-only papers and speak-only conferences, but the situation has apparently become even worse, resulting in substantial difficulties with reviewing (“reviewing stinks ... [due to] the dizzying sea of junk that we all have to read”), and in expansion of irrelevant theory. This leads, in particular, to two drastically different philosophies for systems research: “make it easy [optimize the least publishable unit grain] or make it relevant [focus on real problems].” The hardest real problems, such as data integration and schema evolution, are often ignored, although they have existed for decades. These old problems reappear because we often forget or ignore history, and “without history, we live in a perpetual present--a domain that, by definition, tends toward the persistently bewildering” [3]. As an example, very serious problems associated with so-called “data quality,” starting with data input, were described in the 1970s [4,5]; some good solutions were proposed at that time, but mostly never implemented. The amount of data has increased by orders of magnitude, and so have the problems. Currently, 60 to 80 percent (and in fact up to 98 percent) of a data scientist’s time is spent on “grunt work preparing datasets of interest,” that is, on “data civilizing,” resulting in Stonebraker’s ongoing Data Tamer project and the Tamr startup “for a product that prepares data for a DBMS instead of being the DBMS.”

Several papers, reprinted and coauthored by Stonebraker, justify the need to replace the more than 25-year-old legacy DBMS codelines (taking their roots from System R) “in favor of a collection of ‘from scratch’ specialized engines.” I tend to agree, especially taking into account the drastic changes in hardware and user interfaces. In one of these papers, “The End of An Architectural Era (It’s Time for a Complete Rewrite),” the authors state in particular that the current relational DBMS vendors “have disk-oriented solutions for a main memory problem.”

While user-defined datatypes (for example, in Postgres) and data civilizing are excellent examples of semantic integrity support, regretfully, the semantic integrity of data is not always made as explicit as it could be. For an interesting example, column-oriented databases (“accessing only the columns needed to answer a particular query”) are a spectacular idea (“C-store: A Column-Oriented DBMS” and the corresponding Vertica startup described in the book), but perhaps domain-orientation--an idea exploited to some extent by Postgres--would be an even better approach (since different attributes may be defined over the same domain), especially for handling and “civilizing” text data, including the synonyms and so on hinted at in the description of the Data Tamer project.

Overall, an important conclusion can be formulated in another quote from the book: “the methods, the attitudes, and the lessons are generally independent of the specific story.” In particular, as Stonebraker stressed, “if you want people to read the papers that you write, you should minimize the effort you ask them to go through in reading your paper.” The papers (stories) in this book satisfy this condition and are highly recommended.

Reviewer:  H. I. Kilov Review #: CR146990 (2011-0251)
1) Stonebraker, M. (Ed.) The Ingres papers: anatomy of a relational database system. Addison-Wesley, Reading, MA, 1986.
2) Christensen, C. M. The innovator’s dilemma. Harvard Business Publishing, Brighton, MA, 1997.
3) Bonea, A.; Dickson, M.; Shuttleworth, S.; Wallis, J. Anxious times: medicine and modernity in nineteenth-century Britain. University of Pittsburgh Press, Pittsburgh, PA, 2019.
4) Kent, W. Data and reality: basic assumptions in data processing reconsidered. North-Holland Pub. Co., New York, NY, 1978.
5) Gilb, T.; Weinberg, G. M. Humanized input: techniques for reliable keyed input. Winthrop Publishers, Cambridge, MA, 1977.
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
General (H.2.0 )
 
 
Database Administration (H.2.7 )
 
 
Database Applications (H.2.8 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Design of the Mneme persistent object store
Moss J. ACM Transactions on Information Systems 8(2): 103-139, 2001. Type: Article
Jul 1 1991
Database management systems
Gorman M., QED Information Sciences, Inc., Wellesley, MA, 1991. Type: Book (9780894353239)
Dec 1 1991
Database management (3rd ed.)
McFadden F., Hoffer J., Benjamin-Cummings Publ. Co., Inc., Redwood City, CA, 1991. Type: Book (9780805360400)
Jun 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy