This readable and instructive paper focuses on algorithms for deadlock detection, rather than deadlock prevention or resolution. Knapp begins by formulating a general model for resource allocation in distributed systems, distinguishing between six different models of increasing generality. In the simplest model, any process within the system may request and monopolize no more than one resource at a time, while in the most general model, all possible variants of overlapping requests for multiple resources are permitted. The types of deadlock that can result in each type of system are different, and the complexity of the required deadlock detection algorithms also differs.
Knapp classifies deadlock detection algorithms into four main types: (1) path-pushing algorithms, (2) edge-chasing algorithms, (3) diffusing computations, and (4) global state detection algorithms. In a path-pushing algorithm, for instance, the processes exchange complete information about the resources they are waiting for. Each process is then able to construct a wait-for-graph of the system as a whole and detect the deadlock configurations. Knapp writes: “One noteworthy point about path-pushing algorithms is that many of them were found to be incorrect, either by not detecting true deadlocks, by discovering phantom deadlocks, or both.” Knapp gives a sufficient number of examples of published and “proved” algorithms to validate his claim.
In an edge-chasing algorithm, the processes can issue special messages called probes. A probe message is forwarded to all processes that the receiving process waits for. A deadlock can be declared if a probe message makes its way back to the originating process. As an example of a simple and well-designed algorithm of this type, the author discusses in some detail the work of Mitchell and Merritt [1]. Their algorithm has the added advantage that the proofs of correctness are elegant and, says Knapp, “fun to read (and write).” The last two types of algorithms, diffusing computations and global state detection, are based on early work by Dijkstra and Lamport, respectively. Knapp gives detailed motivations and examples of each type of algorithm.
One of Knapp’s conclusions is particularly worth quoting: “The large number of errors in published algorithms addressing the problem of distributed deadlock detection . . . shows that only rigorous proofs, using as little operational argumentation as possible, suffice to show the correctness of these algorithms.” Some of these errors have escaped notice for many years. This paper is a laudable attempt to set the record straight.