Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Software failure investigation : a near-miss analysis approach
Eloff J., Bella M., Springer International Publishing, New York, NY, 2017. 119 pp. Type: Book (978-3-319613-33-8)
Date Reviewed: Feb 15 2018

This book bothered me. There were profound positives and profound negatives that I discovered as I proceeded through the (very short, barely 100-meaty-page) book.

First, the positives. There is a great need for this book. It attacks its title topic, software failure investigation, in some depth, recommending two approaches to dealing with failures: digital forensics and near-miss analysis. It treats those two subjects largely appropriately, providing a definition of each approach and some implementation suggestions for creating systems to assist with those tasks. So far, so good.

But then there are the negatives. Primarily, there is the problem of how to implement near-miss analysis for software. Basically, the concept is anticipating based on things that go slightly wrong how to ward off things that may go more solidly wrong in the future. Truth to tell, that’s rarely how software fails! That is, hardware tends to fail by breaking or wearing out, and software does neither of those things. (It is worthwhile noting that this is one of the most favorable, and important, traits of software, one that accounts for its emerging success in the computing field as opposed to hardware solutions for the same problems.)

The authors of this book admit to that problem and do their best to deal with it. They recommend that software with a service-level agreement track deviations from that agreement and use those as near misses. It also suggests that when software has redundant components to deal with potential failures, accesses to those redundancies are a good place to do near-miss analysis.

But the trouble is that most software does not have a service-level agreement, and most software systems don’t do redundancy analysis where a system converts to a backup when things are going wrong. The authors are correct in saying that near-miss analysis could be used in those cases, but they are wrong if they assume that such approaches are in common use. And if they are not commonly used, where does it leave the practitioner who would like to do near-miss analysis?

There are a couple of other things that bother me. The book does a fair job of digging out stories of failed software systems of the past, thus providing a motivation for the use of software failure approaches. But there’s a problem there as well: some of the best-known software failure stories come from the writings of Peter Neumann, Robert L. Glass (your reviewer), and Capers Jones, and of those three there is only one inadequate citation (to Neumann) in the book. Perhaps more worrisome is the absence of any references to software reliability techniques that could indeed provide assistance to the authors’ near-miss analysis: software fault tolerance approaches and assertion checking. Both of those techniques could be of great use in near-miss analysis, both approaches are discussed thoroughly in books on software reliability, and yet these authors seem to be unaware of either approach.

There is one other upsetting possibility here. The authors’ citations are all quite recent, mostly year 2000 and newer. The failure stories to which I refer, and the reliability textbooks as well, came out of the previous century. One could say that material in the rapidly moving software field older than, say, 20 years is not worth citing. But software is not a faddish field, and the noted references are just as appropriate to the two major technical advancements of recent times, the agile approaches and open-source software, as they were to systems of yore!

So what’s my bottom line? The concern of this book, software failure, is highly appropriate. The approaches the book recommends to address, the problem of software failure, digital forensics, and near-miss analysis, may eventually be entirely appropriate. But the prime-time usefulness, especially of the near-miss approach, needs a huge amount of further work before it can be worthwhile for software practitioners.

Reviewer:  R. L. Glass Review #: CR145855 (1805-0193)
Bookmark and Share
  Featured Reviewer  
 
Testing And Debugging (D.2.5 )
 
 
Distribution, Maintenance, and Enhancement (D.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Testing And Debugging": Date
Software defect removal
Dunn R., McGraw-Hill, Inc., New York, NY, 1984. Type: Book (9789780070183131)
Mar 1 1985
On the optimum checkpoint selection problem
Toueg S., Babaoglu O. SIAM Journal on Computing 13(3): 630-649, 1984. Type: Article
Mar 1 1985
Software testing management
Royer T., Prentice-Hall, Inc., Upper Saddle River, NJ, 1993. Type: Book (9780135329870)
Mar 1 1994
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy