To ensure the operation of a database system after a disaster, a backup copy of the database and of the changes applied to it, that is, the redo-log, must be maintained at a remote geographical location. If the redo-log is only applied after a disaster, the impact on the production system is small. If the redo-log is applied continuously as if the backup site were part of a distributed database system, the impact is significant. In both options there is no loss of data. They differ only in the time necessary to recover after a disaster. The algorithms used are called 2-safe. Algorithms that allow fast recovery, but still have only a small impact on the production system, are called 1-safe. With 1-safe algorithms, data may be lost.
Polyzois and García-Molina present their research on the performance of a 2-safe algorithm and two 1-safe algorithms in a distributed environment. They explain in detail their “dependency reconstruction” and “epoch” algorithms and their testbed, but they do not define the format of their redo-logfile. From the context, we can guess that they assume a redo-logfile consisting of transaction and database records. Other possibilities include a redo-logfile recording the physical datapages that were changed. I also missed an explanation of why anyone who needs continuous operation would accept loss of data.
The algorithms perform as expected. Was the outcome ever in doubt? Database administrators will appreciate this paper not so much for its performance study as for the clarity of the presentation of disaster recovery mechanisms.