Since the early 1970s, many researchers and authors have visited the subject of database replication protocols, mostly in conjunction with discussions of distributed databases. This topic is difficult because there are conflicting requirements among the performance, reliability, data consistency, and fault-tolerance issues for these protocols. Many complex schemes have been proposed, but few have been implemented by the commercially available database management systems (DBMSs). Following the methodology first introduced by Gray et al., the authors classified the replication mechanism into four classes as the four quadrants of the matrix (see Table 1 in the paper). They are classified as either “eager” or “lazy” on the time frame of the replication update (the x-axis). As to where the replications are being applied, they are divided into either “primary copy” or “update everywhere” (the y-axis). Primary copy is the centralized approach, while update everywhere is the distributed approach.
Kemme and Alonso maintain that their protocols follow a solution based on a combination of ideas from group communication and concurrency control. They rely on group communication systems that provide group maintenance, reliable message exchange, and message-ordering primitives between a group of nodes in a distributed system. Using a read-one/write-all-available algorithm, they reduce the number of messages per transaction. Group communication also ensures that all messages are received in the same total order at all sites. Assuming all operations of a transaction are sent in a single message, the transactions will arrive at all sites in the same order. By granting locks in the order of transaction arrival, one can guarantee that all sites perform the same updates in exactly the same order to avoid deadlock.
The next improvement the authors attempted is to lessen the serialization restrictions implemented in the commercial DBMSs. They use alternative correctness criteria that allow lower levels of isolation. The different levels of isolation allow a tradeoff of performance versus correctness and allow more optimal implementation, depending on the situation. The authors’ last protocol improvement is to optimize the use of different levels of fault-tolerance. This approach is similar to that taken with the different levels of isolation. Full correctness can be weakened, with the tradeoff being faster solutions. The reliability of message delivery will determine the overall correctness. It is true that while complex message exchange mechanisms guarantee the atomicity of transactions beyond failures, faster communication protocols only provide a best-effort approach.
The paper’s nine sections are as follows. Section 1 is the introduction. Section 2 provides an overview of the existing replication solutions. Section 3 presents the basic conception of the authors’ protocols. Section 4 describes the system model. Section 5 presents a family of protocols providing different levels of isolation. Section 6 redefines the algorithms to provide different levels of fault-tolerance. Section 7 provides an overview of the simulation system, which provides analyses of the protocols. Section 8 describes the experiments conducted and their results. Section 9 gives the authors’ conclusions. Appendices A and B provide more formal proofs of the protocols in sections 5 and 6. There are more than 60 references at the end of the paper.
This is a fairly comprehensive paper on eager database replication protocols, which are important in implementing distributed DBMSs in a network of nodes. This paper should be read by DBMS practitioners who are interested in practical solutions to the complexities of database replication protocols in a real-world environment, in which compromises are mandatory.