In a replicated database, a data item may have copies residing on several sites. A replica control protocol is necessary to ensure that data items with several copies behave as if they consist of a single copy, as far as users can tell. We describe a new replica control protocol that allows the accessing of data in spite of site failures and network partitioning. This protocol provides the database designer with a large degree of flexibility in deciding the degree of data availability, as well as the cost of accessing data.
--Authors’ Abstract
This abstract correctly and succinctly summarizes the paper, which consists of five sections. The introduction presents the problem well. Section 2 describes the formal model and the criteria for correct operation. Section 3 proposes the protocol that maintains all the replicated copies, while Section 4 is the formal proof. Section 5 then proposes some modifications to the protocol that could make it operate faster without losing correctness.
One major problem I had was yet another potential meaning for the word “partition.” In this paper, the partitioning is not of the data (objects 1–n are on machine 1 and objects n–m are on machine 2, or similar attribute partitioning), but rather a network fracturing that leaves some machines unable to communicate with other machines. While the English use of the term is correct, I thought that among database practitioners we had finally begun to have a common use of the term. Clearly not, unfortunately. Since the paper is well reviewed, this is not a problem with the authors and may be only a problem of my expectations.
The paper is well written with a good interspersing of examples with formal proofs. It is not light reading, and only those with a special interest in distributed replicated databases are likely to want to read it. However, the topic is timely and very important as “distribution” becomes real. I would like to see experimentation to determine whether the theory really works and if it can be made to run with any acceptable performance. At least, a reading of this version seems to imply that it would work, and adequate-speed communication links would keep the performance certainly no worse than if there were a single copy of the data physically located elsewhere than on your local computer.