The authors present a methodical, pragmatic approach to reliable computer system design and evaluation. They have written a stimulating and practical how-to book for new and experienced computer designers, engineers, professors, students, and project managers. This book has three intended audiences: advanced undergraduate students who are interested in reliable system design and who have taken some basic courses in computer science; graduate students looking for a higher-level course in reliable system design; and practicing engineers interested in incorporating comprehensive reliable techniques into their next designs.
This second edition is extensively revised and adds valuable material from the authors’ theoretical and practical experience of the 10 years since the first edition [1] appeared. The book is structured in three main parts and 11 chapters. Part 1, “The Theory of Reliable System Design,” consists of six chapters that, taken together, could almost be called an all-purpose outline of how to use the techniques for fault-tolerant computer design. Chapter 1 presents fundamental concepts of reliable systems. Chapter 2 discusses faults and their manifestations. Chapter 3 covers reliability and availability techniques. Chapter 4 discusses maintainability and testing techniques. Chapter 5 gives evaluation criteria for system reliability, with a detailed taxonomy of modeling techniques. Chapter 6 discusses fundamental financial aspects of the system life cycle phases.
Part 2 describes how different system designers have used the techniques described in Part 1 to create advanced and successful fault-tolerant architectures. It contains four chapters, each dedicated to a distinct application of fault-tolerant computers. Chapter 7 presents general-purpose computing with IBM and DEC mainframes, including detailed case studies. Chapter 8 discusses high-availability systems architectures. Chapter 9 covers long-lived, highly redundant systems. Chapter 10 addresses critical computations carried out in real-time systems with special hardware and software. I suggest that advanced readers skim over chapter 5 and consult additional references to supplement chapters 7, 9, and10.
Part 3, which consists only of chapter 11, introduces an eight-step, top-down design methodology for the systematic design of dependable systems. A case study is dedicated to the design process of the DEC VAXft 310.
The book is completed by five detailed appendices, a glossary of terms, a nearly exhaustive bibliography containing 1094 references, the credits, and a detailed cross-referencing index. Chapters 2 through 6 provide 80 adequate problems, for which mathematical support is presented in chapter 5 and Appendix E. This volume is useful as a classroom textbook.
The authors balance text, pictures, figures, tables, and flowcharts. The book is not too long, and is a suitable introduction to its two subjects for the three audiences specified. Its best chapters are 1, 2, 5, 7, and 9; the best appendices are B and E. A clear and comprehensive text with plenty of white space and good typography for figures, tables, and mathematical expressions polish this edition. Having been a computer engineer, systems and application programmer, scientific researcher, author, teacher, and consultant during an 18-year professional career, I am highly impressed by this book, and I congratulate the authors.