The fraction of fixed content data (data that is immutable) is likely greater than the fraction of mutable (changeable) data. Much of this fixed content data (for instance, reference and compliance data) can be moved to an active archive where users can still access the data online.
The large-scale storage systems that will house archival data face numerous challenges, including scalability to meet ever-growing space demands, improved space efficiency to save costs, and the ability to easily locate and swiftly retrieve data from the archive. This paper presents the DeepStore archival storage architecture to address these challenges. Conceptually, the DeepStore architecture has four primary abstractions: storage objects (in effect data, such as files), physical storage components, software architecture, and a storage interface. The authors go on to describe progressive redundancy elimination of similar and identical data in objects (PRESIDIO). PRESIDIO is a framework that incorporates multiple storage methods to reduce redundancy, shrinking the amount of storage space that large storage archives require as much as possible.
The authors then discuss the question of how to reintroduce redundancy for reliability. Work to date on this problem is discussed, but the authors recognize that this is the subject of future work. The paper should be very useful and informative for researchers in the field of archival storage.