Computing Reviews, the leading online review service for computing literature.

Search

Self-stabilizing clock synchronization in the presence of Byzantine faults
Dolev S., Welch J. Journal of the ACM51 (5):780-799,2004.Type:Article

Date Reviewed: Nov 17 2004

The devil is in the details. Twenty-five million Europeans died from the bubonic plague, the “Black Death,” in the Middle Ages. The European outbreaks have been traced to a single Italian merchant ship that picked up goods from China during an outbreak there; it was unknown in Europe beforehand. Mighty Athens fell to the same plague centuries earlier, when its citizens crowded together in the city during a Spartan siege. What triggered these outbreaks? Fleas, which transmitted the plague from rat hosts to humans. We now have computer systems reaching around the globe, spreading themselves farther and thinner (as in cell phones, and other handhelds). Is there some small detail that could bring them crashing down? The integrity of every transactional system, whether it’s a small database, or the largest business-to-business (B2B) network, depends on clocks. If the order of transactions cannot be determined, a transactional system can’t function. Imagine transactions being sent to a bank account. They can produce very different real-world results if you vary the order in which you apply them. Synchronized clocks are the heartbeat of distributed systems. This paper was published in the Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing, in August 1995, as “Self-stabilizing clock synchronization with Byzantine faults.” I could only find a one-page summary when I looked it up in the ACM Digital Library today, but I used the entire paper a few years ago to design some middleware. Dolev and Welch build on the work of Lamport and Melliar-Smith [1]. They make several distinct contributions. First, they expand the robustness of the recovery protocol, to function with up to one-third of the processors known or suspected to be faulty. Two, they also remove the specification of unbounded clocks that was present in previous work, which makes their protocols usable on real-world systems. Third, they make the protocol self-stabilizing, so that no initial state has to be globally asserted, and to allow resynchronization (recovery) after transient faults, like more than a third of the processors becoming temporarily faulty. Finally, they present two protocols, one of which is dependent on a shared pulse. The article is 19 pages long, very readable, and, as mentioned above, was used to create real software.

Reviewer: Bayard Kohlhepp	Review #: CR130437 (0504-0452)

1)	Lamport, L.; Melliar-Smith, P.M. Synchronizing clocks in the presence of faults. J. ACM 32, (1985), 1–36.

Performance of Systems (C.4 )

Network Protocols (C.2.2 )

Organization And Design (D.4.7 )

Reliability (D.4.5 )

Would you recommend this review?

yes

Other reviews under "Performance of Systems":	Date

A computer and communications network performance analysis primer Stuck B., Arthurs E., Prentice-Hall, Inc., Upper Saddle River, NJ, 1985. Type: Book (9789780131639812)	Jun 1 1985

A mean value performance model for locking in databases Tay Y., Suri R. (ed), Goodman N. Journal of the ACM 32(3): 618-651, 1985. Type: Article	Mar 1 1986

The relationship between benchmark tests and microcomputer price Sircar S., Dave D. Communications of the ACM 29(3): 212-217, 1986. Type: Article	Nov 1 1986

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy