Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Implementing efficient fault containment for multiprocessors: confining faults in a shared-memory multiprocessor environment
Rosenblum M., Chapin J., Teodosiu D., Devine S., Lahiri T., Gupta A. Communications of the ACM39 (9):52-61,1996.Type:Article
Date Reviewed: Jun 1 1997

The authors contend that large-scale multiprocessors are plagued by failures in hardware and software that frequently bring down the entire system, requiring that the machine be rebooted. They propose a scheme for fault containment, then attempt to show its effectiveness by simulation.

I had great difficulty in following this paper. Much of the work leading to this project is described in specialized conference proceedings and symposia. It may be difficult for the reader to locate those papers, even though they are appropriately cited. The average CACM reader will not be familiar with this background, which should have been summarized where applicable.

Experiments and simulations were run on a model called Hive to show the effectiveness of the approach. Hive is not explained clearly with respect to the problem at hand.

The results are summarized in a table and a figure. The table shows only the errors injected into the system, not the effectiveness of the technique. The figure shows the time to completion of five multiprocessor combinations running three programs, but does not demonstrate the advantages, if any, of the technique. The accompanying text did not clarify the figure, but further confused me.

This paper should not have appeared in CACM; it would have been better if it were assigned to a more specialized journal.

Reviewer:  Ivan Flores Review #: CR120472 (9706-0463)
Bookmark and Share
  Featured Reviewer  
 
Multiprocessing/ Multiprogramming/ Multitasking (D.4.1 ... )
 
 
Fault-Tolerance (D.4.5 ... )
 
 
Modeling And Prediction (D.4.8 ... )
 
 
General (D.4.0 )
 
 
Performance (D.4.8 )
 
 
Reliability (D.4.5 )
 
Would you recommend this review?
yes
no
Other reviews under "Multiprocessing/Multiprogramming/Multitasking": Date
Algorithms for scheduling homogeneous multiprocessor computers
Ondáš J., Springer-Verlag, London, UK, 1984. Type: Book (9789780387136578)
Aug 1 1985
Parallel programming
Perrott R., Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1987. Type: Book (9789780201142310)
Jul 1 1988
Operating systems: communicating with and controlling the computer
Keller L., Prentice-Hall, Inc., Upper Saddle River, NJ, 1988. Type: Book (9789780136380405)
Sep 1 1989
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy