Computing Reviews, the leading online review service for computing literature.

Search

Damage Assessment for Optimal Rollback Recovery
Lin T., Shin K. IEEE Transactions on Computers47 (5):603-613,1998.Type:Article

Date Reviewed: Aug 1 1998

Distributed processing creates special problems in reconciling the work performed by the computational modules. Faults can occur and be propagated as computational errors in the module itself and in other modules. When this situation is detected, it is necessary to recover what is good and redo the portions that have been contaminated with error. Lin and Shin examine this problem under conditions in which fault and error detection may be neither instantaneous nor complete. There are two major portions of this paper: formal analysis of damage assessment and evaluation of the optimal rollback points. Damage assessment is examined using three cases: the error is detected and the faulty module is identified; the error is detected but the faulty module is unidentified; and a fault is uncovered by periodic diagnostics. For each of the cases, the authors generate sets of equations describing the density function for the conditional probability that a computational node’s contamination time is no later than some time t. The derivations are detailed and lengthy. Fortunately, the authors provide a list of symbols in this section to assist readers. The evaluation of optimal rollback points is a problem in nonlinear integer programming. Two algorithms are developed, the first being a rollback algorithm that minimizes the mean recovery overhead, and the second being a branch-and-bound algorithm that serves the rollback algorithm by extracting and examining subsets of rollback points. A small simulation is provided as a demonstration. A list of symbols should have been attached to this section as well. In the last section, the authors discuss the integration of damage assessment with optimistic message logging and checkpointing schemes. They note that a significant problem in integrating damage assessment with an existing rollback scheme is the amount of stable storage required to log messages.

Reviewer: Anthony J. Duben	Review #: CR121965 (9808-0622)

Integer Programming (G.1.6 ... )

Asynchronous/ Synchronous Operation (B.4.3 ... )

General (C.2.0 )

Interconnections (Subsystems) (B.4.3 )

Would you recommend this review?

yes

Other reviews under "Integer Programming":	Date

Knapsack problems: algorithms and computer implementations Martello S., Toth P. (ed), John Wiley & Sons, Inc., New York, NY, 1990. Type: Book (9780471924203)	Feb 1 1992

Construction of test problems in quadratic bivalent programming Pardalos P. (ed) ACM Transactions on Mathematical Software 17(1): 74-87, 1991. Type: Article	Sep 1 1991

Integer and combinatorial optimization Nemhauser G., Wolsey L., Wiley-Interscience, New York, NY, 1988. Type: Book (9789780471828198)	Mar 1 1989

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy