Since the beginning of the Computer Age, databases have stored data and applications have used that data. The use of the same data at the same time by different applications is called concurrency, and it may cause problems. Different methods have been devised over time to control concurrency and avoid these problems; however, nowadays, these methods are showing some limits, particularly when data is distributed among several machines. Today, most data access is organized in transactions, or data access sequences, that must logically be treated as a single access. Distributed transactions raise dependency problems: it is not always clear which application accessed some given data first; in these cases, transactions need to be rolled back, or data brought to its initial state. Most current solutions solve this problem with transaction partitioning, or executing transactions in batches, but these solutions in turn may raise problems in case of rollbacks. This problem is usually addressed by logging transactions at runtime, which causes overhead; when failures happen and recovery is needed, some downtime is certain. Moreover, most of these transactions live in memory, not on a disk; thus, bottlenecks are no longer input/output (I/O) writes, but the logging activities themselves.
The proposed dependency graph-based concurrency control (DGCC) protocol uses graphs to control and log distributed concurrent transactions at the same time. It executes both transactions and logging in batches, lowering both transaction times by avoiding deadlocks and logging time by avoiding aborts. The heart of the protocol is transaction dependency graphs. The paper defines them with logic definitions, figures, and pseudocode for synchronous execution; it shows how these graphs are suited for both multithread, single-node environments and multithread, multi-node distributed environments via a coordination mechanism. This is the most remarkable part of the paper, that is, despite technical language appropriate for an academic work, the authors present solutions, discover the problems these solutions raise, and then overcome the problems with different solutions in an almost thrilling way.
The final part of the paper contains tests and simulations that show improvements in overhead reduction, because logging is performed alongside transaction execution; scalability, because graphs are easily extendable; and overall runtime performance, due to concurrent execution.