ComputingReviews.com

Towards a big data system disaster recovery in a private cloud
Chang V. Ad Hoc Networks35(C):65-82,2015.Type:Article

Date Reviewed: 07/11/16

Big data is having more and more of an impact on organizations and has business value so the data dare not be lost. This paper covers the role that disaster recovery (DR) needs to play for big data systems in being able to recover from possible data loss.

DR requires at least two geographically distributed sites. If the primary site with its original copy of the data becomes unavailable due to a disaster (such as a fire), the remote site can restore the data from its copy. The paper argues that having just two sites with one data copying method (that is, a “single-basket” approach) is inadequate for a big data system because total data recovery cannot be assured. Instead, the paper proposes a multisite and multitechnique approach.

The paper uses an example of three English locations (London, Southampton, and Leeds) where all locations can provide restoration services. As part of the capabilities, in London the big data system is a private cloud storage area network (SAN) architecture that is made up of different network attached storage (NAS) services. The big data files are medical records, where single files can be quite large (say 1GB). The paper compares the use of three techniques: TCP/IP baseline for backup, snapshot, and replication. The author discusses experiments and results using the three techniques.

Even though the big data architecture is a traditional shared storage infrastructure, Hadoop-style big data infrastructure advocates may still be able to glean some valuable insights from this paper.

Reviewer: David G. Hill

Review #: CR144564 (1609-0665)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy