Totem is a single-ring protocol for high-performance, fault-tolerant distributed systems that must continue to operate despite network partitioning and re-merging and despite processor failure and restart. Totem provides totally ordered message delivery with good performance using a logical token-passing ring imposed on a broadcast domain.
After an introductory section, the authors present related work and highlight the differences of the Totem protocol. Significant literature on the subject is analyzed. Section 3 is dedicated to the distributed system model used in the Totem protocol design. Several terms related to protocol functioning are defined.
The objective of Totem is to provide the application with reliable message delivery and membership services. These services are described in section 4 of the paper. Section 5 is devoted to the total ordering protocol with the assumptions that the token is never lost; processor failures do not occur; and the network does not become partitioned. In section 6, the conditions are relaxed, and the protocol to handle token loss, processor failure and restart, and network partitioning and re-merging is presented. The protocol is described using a finite-state machine model. Data structures used, as well as pseudocode for the work performed by processors during different states of the model, are also given.
Sections 7 and 8 present the recovery protocol that maintains extended virtual synchrony during recovery after a failure, and the flow control mechanism that avoids message loss due to buffer overflow. Section 9 addresses implementation and performance. Future work is mentioned at the end of the paper.
The paper is well structured, but the presentation is not uniform, some aspects being described in great detail, while others are quickly summarized. The important works on the subject are included as references.