Computing Reviews, the leading online review service for computing literature.

Search

Recovery management in QuickSilver
Haskin R., Malachi Y., Chan G. ACM Transactions on Computer Systems6 (1):82-108,1988.Type:Article

Date Reviewed: Aug 1 1988

Quicksilver is a network operating system for IBM workstations connected by a token ring. Quicksilver provides system services as user-level processes that maintain client states. Servers are resilient to external failure and can recover resources associated with failed clients. The commit protocol and log recovery primitives are available to applications so servers can tailor recovery techniques to requirements, trading off simplicity and efficiency against recoverability. The authors have adopted a high-overhead transaction mechanism in Quicksilver, but with the policy of using it only when necessary. To this end, servers are divided into four types: those that have volatile internal states and only require signaling capability, such as the window manager; those that manage replicated volatile states and use transaction commit for atomicity, like the name server; those that manage recoverable states and require a full panoply of recovery mechanisms, like the file server; and those that manipulate long-lived states and require log service for checkpointing. Only those that manage recoverable states are truly expensive in Quicksilver. Transaction overhead is further reduced by providing alternative commit protocols to servers, so servers can choose how much to pay for recovery. Interprocess communication (IPC) addresses in Quicksilver are evidently site-dependent (contrary to the author’s statement in section 2.1), so IPC is location sensitive. Thus services (except for transaction management) are bound to nodes, migration is expensive, and load balancing (usually a fundamental rationale for a network operating system) is probably impractical. The Quicksilver IPC mechanism is heavily loaded, with responsibility for guaranteeing delivery and message ordering, for enforcing security constraints, and for maintaining transaction connectivity graphs. These overheads slow down processes that do not require the benefits provided and to some extent defeat the author’s goal of paying optional overhead for optional services. The paper contains a comprehensive review of possible approaches and a wide-ranging survey of the distributed operating system literature.

Reviewer: Jason Gait	Review #: CR112497

Distributed File Systems (D.4.3 ... )

Checkpoint/ Restart (D.4.5 ... )

Distributed Databases (H.2.4 ... )

Fault-Tolerance (D.4.5 ... )

File Organization (D.4.3 ... )

Maintenance (D.4.3 ... )

Would you recommend this review?

yes

Other reviews under "Distributed File Systems":	Date

Distributed file systems: concepts and examples Levy E., Silberschatz A. ACM Computing Surveys 22(4): 321-374, 2001. Type: Article	Nov 1 1991

Scale and performance in a distributed file system Howard J., Kazar M., Menees S., Nichols D., Satyanarayanan M., Sidebotham R., West M. ACM Transactions on Computer Systems 6(1): 51-81, 1988. Type: Article	Jul 1 1988

Technique for redundancy control in a distributed hierarchical filestore Lunn K., Bennett K. Information Technology Research Development Applications 3(3): 157-161, 1984. Type: Article	Dec 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy