Computing Reviews, the leading online review service for computing literature.

Search

DCDedupe: selective deduplication and delta compression with effective routing for distributed storage
Zhang B., Wang C., Zhou B., Yuan D., Zomaya A. Journal of Grid Computing16 (2):195-209,2018.Type:Article

Date Reviewed: Nov 2 2018

In big data scenarios, data is often duplicated for system response time, efficiency, and network issues and then eliminated through deduplication. Data is compressed for storage efficiency and further abstraction or regression. The authors propose DCDedupe, a system for on-premise distributed storage that selectively compresses data and efficiently impacts deduplication using analytics, hardware acceleration, and design parameters (cost, efficiency, and effectiveness). DCDedupe is centered on (1) a (quick) decision mechanism technique for yielding acceptable accuracy and (2) an algorithm for selecting, marshaling, and routing (distributed) data chunks to ensure they are sent to the right nodes of the distributed data system. The paper is divided into sections: “Introduction,” “Related Work,” “Deduplication vs. Delta Compression,” “Design,” “Evaluation,” and “Conclusions.” Using conclusions from a case study, the design section describes DCDedupe design principles and considerations, selecting an architecture and system, chunk classification methods, routing algorithms, delta compression levels, and the overall (work and data) flow. The evaluation section includes the experimental setup, storage efficiency results, sampling methods, and memory usage for sampling records. In the last section, the authors conclude that DCDedupe improves the decision-making accuracy in pre-processing and reduces storage space requirements by 30 percent; however, there is some penalty on processing speed (between 15 to 22 percent). Further work on pre-processing methods, fault tolerance enhancements, and server overload is required. The paper has 28 references and should interest people in the big data field.

Reviewer: Anoop Malaviya	Review #: CR146303 (1902-0035)

Data Storage Representations (E.2 )

Cloud Computing (C.2.4 ... )

Distributed Systems (H.3.4 ... )

Grid computing (C.2.4 ... )

Would you recommend this review?

yes

Other reviews under "Data Storage Representations":	Date

An efficient representation for sparse sets Briggs P., Torczon L. ACM Letters on Programming Languages and Systems 2(1-4): 59-69, 1993. Type: Article	Dec 1 1994

Adaptive data structures for IP lookups Ioannidis I., Grama A., Atallah M. Journal of Experimental Algorithmics 10(es): 1.1-es, 2005. Type: Article	Jan 13 2006

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions Andoni A., Indyk P. Communications of the ACM 51(1): 117-122, 2008. Type: Article	Oct 15 2009

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy