Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Damage Assessment for Optimal Rollback Recovery
Lin T., Shin K. IEEE Transactions on Computers47 (5):603-613,1998.Type:Article
Date Reviewed: Aug 1 1998

Distributed processing creates special problems in reconciling the work performed by the computational modules. Faults can occur and be propagated as computational errors in the module itself and in other modules. When this situation is detected, it is necessary to recover what is good and redo the portions that have been contaminated with error. Lin and Shin examine this problem under conditions in which fault and error detection may be neither instantaneous nor complete. There are two major portions of this paper: formal analysis of damage assessment and evaluation of the optimal rollback points.

Damage assessment is examined using three cases: the error is detected and the faulty module is identified; the error is detected but the faulty module is unidentified; and a fault is uncovered by periodic diagnostics. For each of the cases, the authors generate sets of equations describing the density function for the conditional probability that a computational node’s contamination time is no later than some time t. The derivations are detailed and lengthy. Fortunately, the authors provide a list of symbols in this section to assist readers.

The evaluation of optimal rollback points is a problem in nonlinear integer programming. Two algorithms are developed, the first being a rollback algorithm that minimizes the mean recovery overhead, and the second being a branch-and-bound algorithm that serves the rollback algorithm by extracting and examining subsets of rollback points. A small simulation is provided as a demonstration. A list of symbols should have been attached to this section as well.

In the last section, the authors discuss the integration of damage assessment with optimistic message logging and checkpointing schemes. They note that a significant problem in integrating damage assessment with an existing rollback scheme is the amount of stable storage required to log messages.

Reviewer:  Anthony J. Duben Review #: CR121965 (9808-0622)
Bookmark and Share
  Featured Reviewer  
 
Integer Programming (G.1.6 ... )
 
 
Asynchronous/ Synchronous Operation (B.4.3 ... )
 
 
General (C.2.0 )
 
 
Interconnections (Subsystems) (B.4.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Integer Programming": Date
Knapsack problems: algorithms and computer implementations
Martello S., Toth P. (ed), John Wiley & Sons, Inc., New York, NY, 1990. Type: Book (9780471924203)
Feb 1 1992
Construction of test problems in quadratic bivalent programming
Pardalos P. (ed) ACM Transactions on Mathematical Software 17(1): 74-87, 1991. Type: Article
Sep 1 1991
Integer and combinatorial optimization
Nemhauser G., Wolsey L., Wiley-Interscience, New York, NY, 1988. Type: Book (9789780471828198)
Mar 1 1989
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy