Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Self-stabilizing clock synchronization in the presence of Byzantine faults
Dolev S., Welch J. Journal of the ACM51 (5):780-799,2004.Type:Article
Date Reviewed: Nov 17 2004

The devil is in the details. Twenty-five million Europeans died from the bubonic plague, the “Black Death,” in the Middle Ages. The European outbreaks have been traced to a single Italian merchant ship that picked up goods from China during an outbreak there; it was unknown in Europe beforehand. Mighty Athens fell to the same plague centuries earlier, when its citizens crowded together in the city during a Spartan siege. What triggered these outbreaks? Fleas, which transmitted the plague from rat hosts to humans.

We now have computer systems reaching around the globe, spreading themselves farther and thinner (as in cell phones, and other handhelds). Is there some small detail that could bring them crashing down?

The integrity of every transactional system, whether it’s a small database, or the largest business-to-business (B2B) network, depends on clocks. If the order of transactions cannot be determined, a transactional system can’t function. Imagine transactions being sent to a bank account. They can produce very different real-world results if you vary the order in which you apply them. Synchronized clocks are the heartbeat of distributed systems.

This paper was published in the Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing, in August 1995, as “Self-stabilizing clock synchronization with Byzantine faults.” I could only find a one-page summary when I looked it up in the ACM Digital Library today, but I used the entire paper a few years ago to design some middleware.

Dolev and Welch build on the work of Lamport and Melliar-Smith [1]. They make several distinct contributions. First, they expand the robustness of the recovery protocol, to function with up to one-third of the processors known or suspected to be faulty. Two, they also remove the specification of unbounded clocks that was present in previous work, which makes their protocols usable on real-world systems. Third, they make the protocol self-stabilizing, so that no initial state has to be globally asserted, and to allow resynchronization (recovery) after transient faults, like more than a third of the processors becoming temporarily faulty. Finally, they present two protocols, one of which is dependent on a shared pulse.

The article is 19 pages long, very readable, and, as mentioned above, was used to create real software.

Reviewer:  Bayard Kohlhepp Review #: CR130437 (0504-0452)
1) Lamport, L.; Melliar-Smith, P.M. Synchronizing clocks in the presence of faults. J. ACM 32, (1985), 1–36.
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Performance of Systems (C.4 )
 
 
Network Protocols (C.2.2 )
 
 
Organization And Design (D.4.7 )
 
 
Reliability (D.4.5 )
 
Would you recommend this review?
yes
no
Other reviews under "Performance of Systems": Date
A computer and communications network performance analysis primer
Stuck B., Arthurs E., Prentice-Hall, Inc., Upper Saddle River, NJ, 1985. Type: Book (9789780131639812)
Jun 1 1985
A mean value performance model for locking in databases
Tay Y., Suri R. (ed), Goodman N. Journal of the ACM 32(3): 618-651, 1985. Type: Article
Mar 1 1986
The relationship between benchmark tests and microcomputer price
Sircar S., Dave D. Communications of the ACM 29(3): 212-217, 1986. Type: Article
Nov 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy