Computing Reviews, the leading online review service for computing literature.

Search

Implementing efficient fault containment for multiprocessors: confining faults in a shared-memory multiprocessor environment
Rosenblum M., Chapin J., Teodosiu D., Devine S., Lahiri T., Gupta A. Communications of the ACM39 (9):52-61,1996.Type:Article

Date Reviewed: Jun 1 1997

The authors contend that large-scale multiprocessors are plagued by failures in hardware and software that frequently bring down the entire system, requiring that the machine be rebooted. They propose a scheme for fault containment, then attempt to show its effectiveness by simulation. I had great difficulty in following this paper. Much of the work leading to this project is described in specialized conference proceedings and symposia. It may be difficult for the reader to locate those papers, even though they are appropriately cited. The average CACM reader will not be familiar with this background, which should have been summarized where applicable. Experiments and simulations were run on a model called Hive to show the effectiveness of the approach. Hive is not explained clearly with respect to the problem at hand. The results are summarized in a table and a figure. The table shows only the errors injected into the system, not the effectiveness of the technique. The figure shows the time to completion of five multiprocessor combinations running three programs, but does not demonstrate the advantages, if any, of the technique. The accompanying text did not clarify the figure, but further confused me. This paper should not have appeared in CACM; it would have been better if it were assigned to a more specialized journal.

Reviewer: Ivan Flores	Review #: CR120472 (9706-0463)

Multiprocessing/ Multiprogramming/ Multitasking (D.4.1 ... )

Fault-Tolerance (D.4.5 ... )

Modeling And Prediction (D.4.8 ... )

General (D.4.0 )

Performance (D.4.8 )

Reliability (D.4.5 )

Would you recommend this review?

yes

Other reviews under "Multiprocessing/Multiprogramming/Multitasking":	Date

Algorithms for scheduling homogeneous multiprocessor computers Ondáš J., Springer-Verlag, London, UK, 1984. Type: Book (9789780387136578)	Aug 1 1985

Parallel programming Perrott R., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1987. Type: Book (9789780201142310)	Jul 1 1988

Operating systems: communicating with and controlling the computer Keller L., Prentice-Hall, Inc., Upper Saddle River, NJ, 1988. Type: Book (9789780136380405)	Sep 1 1989

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy