Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Resilient computer system design
Castano V., Schagaev I., Springer Publishing Company, Incorporated, New York, NY, 2015. 256 pp. Type: Book (978-3-319150-68-0)
Date Reviewed: Aug 26 2015

Resilience is understood as returning to an operational state, in response to failures, breaches, attacks, and other similar disturbances. This book addresses an emerging issue of resilience, that is, keeping computer systems and software resilient in case of external or internal disruptions to their operation. Even though it may look like splitting hairs, a system property corresponding to resilience, understood as the system’s ability to return to an operational state under disruption, would be called resiliency; however, the authors never spell out this distinction.

The book comprises ten chapters, which cover a number of interrelated topics that can be roughly grouped into the following four categories: basic concepts and background of resilience (first two chapters), fundamentals of fault tolerance with respect to hardware (chapters 3 to 5), hardware and software support of resilience with implementation (chapters 6 to 8), and the two last chapters with a conclusion and a vision for the future. The key point is that the authors view resilience as an extension or generalization of fault tolerance by looking at both fault tolerance and resilience not as properties but as processes, and that’s the idea that drives their work. This is all placed in the context of safety-critical systems, which is the authors’ focus.

In the definition of resilience, the authors bring up a number of attributes, which form a comprehensive set of properties to assist in evaluating resilience, being primarily a combination of reliability, safety, security, and performance. Interestingly, security is defined as being composed of integrity, maintainability, and availability, which is slightly inconsistent with a commonly adopted meaning of security as having confidentiality, integrity, and availability (CIA) components or objectives, which has its grounds in a logical definition of security: the extent to which information and data are protected so that unauthorized persons or systems cannot read (confidentiality) or modify them (integrity) and authorized persons or systems are not denied access to them (availability). This has been spelled out in FIPS PUB 199 [1], which is a standard for security categorization.

What is unique and interesting in this book is that as a component of resilience, the authors also include evolvability, by which they mean the ability to “perform changes to the system, decreasing its level of performance or reliability for a specific time range to compensate for faults or during exceptional circumstances.” More specifically, they consider that a resilient system must have “the ability to be adaptable, understanding adaptability as the ability to evolve while executing.”

Chapters related to fault tolerance provide a pretty standard account of major fault-related concepts, with redundancy as a major vehicle, and discussion of negative impacts of radiation. The added value of this part is a brief outline of the authors’ own generalized algorithm of fault tolerance (GAFT). Equipped with this basic knowledge, the authors then proceed to a description of evolving reconfigurable architecture (ERA), which takes three chapters, on hardware, on software, and on prototype field-programmable gate array (FPGA) implementation. The latter chapter is probably the most interesting part of the book, with a discussion of real design decisions, including a comparison of the unique ERA architecture with those of x86, Sparc, and ARM.

In the concluding part, the authors summarize their work and offer some predictions on the future of evolving architectures. The book is valuable mostly for researchers working in the area of computer architectures, with an additional twist to add resilience to it. As a final remark, I wonder if the manuscript of the book was ever edited by the publisher. There are numerous typographical errors in it, for example in names, from “Moor’s law” on page 1 to “Gutknetch” on page 251. It may not obstruct understanding of the material, but is extremely annoying, to say the least.

Reviewer:  Janusz Zalewski Review #: CR143724 (1511-0925)
1) National Institute of Standards and Technology. "Standards for Security Categorization of Federal Information and Information Systems." FIPS PUB 199. 2004 http://csrc.nist.gov/publications/fips/fips199/FIPS-PUB-199-final.pdf.
Bookmark and Share
  Featured Reviewer  
 
Reliability, Availability, And Serviceability (C.4 ... )
 
 
Reliability (D.4.5 )
 
Would you recommend this review?
yes
no
Other reviews under "Reliability, Availability, And Serviceability": Date
Implementing fault-tolerant services using the state machine approach: a tutorial
Schneider F. ACM Computing Surveys 22(4): 299-319, 2001. Type: Article
Jul 1 1992
Network reliability and algebraic structures
Shier D., Clarendon Press, New York, NY, 1991. Type: Book (9780198533863)
Sep 1 1992
On building systems that will fail
Corbató F. Communications of the ACM 34(9): 72-81, 1991. Type: Article
Sep 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy