Quicksilver is a network operating system for IBM workstations connected by a token ring. Quicksilver provides system services as user-level processes that maintain client states. Servers are resilient to external failure and can recover resources associated with failed clients. The commit protocol and log recovery primitives are available to applications so servers can tailor recovery techniques to requirements, trading off simplicity and efficiency against recoverability.
The authors have adopted a high-overhead transaction mechanism in Quicksilver, but with the policy of using it only when necessary. To this end, servers are divided into four types: those that have volatile internal states and only require signaling capability, such as the window manager; those that manage replicated volatile states and use transaction commit for atomicity, like the name server; those that manage recoverable states and require a full panoply of recovery mechanisms, like the file server; and those that manipulate long-lived states and require log service for checkpointing. Only those that manage recoverable states are truly expensive in Quicksilver. Transaction overhead is further reduced by providing alternative commit protocols to servers, so servers can choose how much to pay for recovery.
Interprocess communication (IPC) addresses in Quicksilver are evidently site-dependent (contrary to the author’s statement in section 2.1), so IPC is location sensitive. Thus services (except for transaction management) are bound to nodes, migration is expensive, and load balancing (usually a fundamental rationale for a network operating system) is probably impractical. The Quicksilver IPC mechanism is heavily loaded, with responsibility for guaranteeing delivery and message ordering, for enforcing security constraints, and for maintaining transaction connectivity graphs. These overheads slow down processes that do not require the benefits provided and to some extent defeat the author’s goal of paying optional overhead for optional services.
The paper contains a comprehensive review of possible approaches and a wide-ranging survey of the distributed operating system literature.