Logical data protection problems--such as accidental file deletion and data corruption due to a virus or a worm--can render files useless. A versioning file system can enable recovery from such failures. The problem is how to find the right files and versions, so that restoration is easy and correct.
This paper advances the notion that causality-based versioning can facilitate the process of selecting and recovering the right versions of a file--after the occurrence of a logical data protection problem. Causality information is derived by examining the processes that read and write files, as well as any changes to the files, in order to determine how two files differ and from what file a certain file is derived.
The authors compare two causality-based algorithms--cycle-avoidance and graph-finesse--to two traditional algorithms in versioning file systems--“open-to-close versioning and versioning on every write.” Compared to the two traditional algorithms, the two new ones do not introduce any significant new overheads, such as compile space, postmark space, or mercurial activity space; in fact, they perform better than versioning on every write algorithm.
As we depend on files more and more--files that are exposed to many risks, such as data corruption--the ability to quickly recover the right files is critical. This paper should be mandatory reading for anyone involved with file system design and development.