Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Reliability of safety-critical systems : theory and applications
Rausand M., Wiley Publishing, Hoboken, NJ, 2014. 466 pp. Type: Book (978-1-118112-72-4)
Date Reviewed: Mar 9 2016

My current venue (of 16 years and counting) continues to afford me the privilege of working with worldwide and world-class engineering firms in ensuring that their designs of such computer-driven control systems as communication-based train-controls (CBTC) and solid-state interlockings (SSIs) (1) preserve such safety-properties as anti-(train)-collision during railroad operation, and (2) direct hazardous failures in fail-safe directions if components malfunction during operation.

These projects are, of course, quintessentially safety-critical; so if the resulting system proves to be unsafe, it’s back to the proverbial drawing-board. A similar thing occurs if the deployed system’s unreliability crosses a contractually pre-established threshold.

Program management of such multi-year mega-projects as modernization and automation of subway lines in a large metropolis has the following essential activities and goals in common with all sizeable safety-critical projects:

  • Specification of the new system’s functional requirements;
  • Specification of the new system’s safety requirements, for example, Safety Integrity Level (SIL) 4, well explicated in this book. (Safety requirements are often classified, somewhat counter-intuitively, as “non-functional” requirements);
  • Specification of safety-certification standards, criteria, and activities;
  • Establishment of contractual reliability requirements (ditto “non-functional”);
  • Establishment and negotiation of the project’s budget and schedule;
  • Specification of project-life-cycle deliverables, such as system baselines and corresponding documentation; and
  • Delineation of a strategy for long-term post-deployment support, such support being constrained to preserve or improve both safety and reliability whenever repairs, enhancements, or other modifications of an operational in-service system occur.

If one were to construe my group’s (Vital Systems Integrity, VSI’s) charter and mandate strictly, then this “reliability” book is about many things that are “not our job,” at least until an unreliability threshold is crossed into “un-safety,” as it were. (“Vital” is closely related to “safety-critical” in railroad-speak.)

Another fact along these lines (in this transparent build-up of a straw-man argument) is that CBTC and SSI technologies are what the US Federal Railroad Administration (FRA) refers to as “processor-[computer hardware/software]-driven” technologies. Section 2.8, “Safety Integrity Levels (SILs),” (of this book) states that “software safety integrity [is] not covered in this book,” and that “systematic [that is, system] safety integrity [is also] not fully covered in this book.”

So, that which is “not covered” is in fact a very large part of my group’s job, even to the extent that formal methods, both theorem-proving and model-checking, are brought to bear, at both the software and system level, in producing mathematical proofs of safety properties of a given CBTC or SSI design. (Our pilot CBTC project, which was deployed into revenue service in January 2006, featured 35,000 B-method proofs [1]). Non-coverage of software in this very thorough book is in fact the correct choice, as for example Paul Niquette’s celebrated article “Software does not fail” [2] paradoxically, but indisputably, shows. (I continue to endanger my career by quoting this attention-getting but truth-bearing phrase to very intelligent engineers who have risen to upper management, but who have never experienced software workarounds as anything but “fixes,” the actual bugs remaining hidden for years before rearing their ugly, if not deadly, heads.)

Here is the straw man being knocked down: This advanced, well-organized, well-written, and technically authoritative book will remain within my reach at all times, even when I have my safety-hat on (which is most of the time). The VSI Group’s familiarity with its topics ranges from nodding to in-depth, and the preponderance of the theory treated in the book is used in our system safety certification activities, which far transcend the use of formal methods. In fact, I am hard put to find something in this book that we have not come across in our safety-assurance and certification activities. For example, even Petri nets figure in our activities, and we administered an in-house survey course on the subject. The book’s treatment of Petri nets is excellent and to the point.

The chapters are: (1) “Introduction”; (2) “Concepts and Requirements”; (3) “Failures and Failure Analysis”; (4) “Testing and Maintenance”; (5) “Reliability Quantification”; (6) “Reliability Data Sources”; (7) “Demand Modes and Performance Measures”; (8) “Average Probability of Failure on Demand”; (9) “Average Frequency of Dangerous Failures”; (10) “Common-Cause Failures”; (11) “Imperfect Proof-Testing”; (12) “Spurious Activation”; (13) “Uncertainty Assessment”; and (14) “Closure.” There is an appendix, “Elements of Probability Theory,” which is so focused on, and useful for, reliability calculations in the large, that it alone is worth the price of the book. This 20-odd page appendix relays the fundamentals of probability and reliability applications in a crystal-clear manner.

My first impression of this book was, “It’s all there, between the covers of a mere four-hundred pages, everything I’ve come across and used in the last 16 years: the concepts, the calculations, the IEC 61508 Functional Safety Standard as guiding document, even the old-friend acronyms.”

My current, post-reading impression is the same: This outstanding book is a must-read for anyone involved in safety-critical systems.

More reviews about this item: Amazon

Reviewer:  George Hacken Review #: CR144225 (1605-0273)
1) Abrial, J. R. The B-book: assigning meanings to programs. Cambridge University Press, New York, NY, 1996.
2) Niquette, P. Software Does Not Fail, http://niquette.com/paul/issue/softwr02.htm, 1996 (Accessed 10/06/2015).
Bookmark and Share
  Featured Reviewer  
 
Reliability, Availability, And Serviceability (C.4 ... )
 
 
Real-Time And Embedded Systems (C.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Reliability, Availability, And Serviceability": Date
Implementing fault-tolerant services using the state machine approach: a tutorial
Schneider F. ACM Computing Surveys 22(4): 299-319, 2001. Type: Article
Jul 1 1992
Network reliability and algebraic structures
Shier D., Clarendon Press, New York, NY, 1991. Type: Book (9780198533863)
Sep 1 1992
On building systems that will fail
Corbató F. Communications of the ACM 34(9): 72-81, 1991. Type: Article
Sep 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy