Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
DCDedupe: selective deduplication and delta compression with effective routing for distributed storage
Zhang B., Wang C., Zhou B., Yuan D., Zomaya A. Journal of Grid Computing16 (2):195-209,2018.Type:Article
Date Reviewed: Nov 2 2018

In big data scenarios, data is often duplicated for system response time, efficiency, and network issues and then eliminated through deduplication. Data is compressed for storage efficiency and further abstraction or regression. The authors propose DCDedupe, a system for on-premise distributed storage that selectively compresses data and efficiently impacts deduplication using analytics, hardware acceleration, and design parameters (cost, efficiency, and effectiveness). DCDedupe is centered on (1) a (quick) decision mechanism technique for yielding acceptable accuracy and (2) an algorithm for selecting, marshaling, and routing (distributed) data chunks to ensure they are sent to the right nodes of the distributed data system.

The paper is divided into sections: “Introduction,” “Related Work,” “Deduplication vs. Delta Compression,” “Design,” “Evaluation,” and “Conclusions.” Using conclusions from a case study, the design section describes DCDedupe design principles and considerations, selecting an architecture and system, chunk classification methods, routing algorithms, delta compression levels, and the overall (work and data) flow. The evaluation section includes the experimental setup, storage efficiency results, sampling methods, and memory usage for sampling records. In the last section, the authors conclude that DCDedupe improves the decision-making accuracy in pre-processing and reduces storage space requirements by 30 percent; however, there is some penalty on processing speed (between 15 to 22 percent). Further work on pre-processing methods, fault tolerance enhancements, and server overload is required.

The paper has 28 references and should interest people in the big data field.

Reviewer:  Anoop Malaviya Review #: CR146303 (1902-0035)
Bookmark and Share
 
Data Storage Representations (E.2 )
 
 
Cloud Computing (C.2.4 ... )
 
 
Distributed Systems (H.3.4 ... )
 
 
Grid computing (C.2.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Storage Representations": Date
An efficient representation for sparse sets
Briggs P., Torczon L. ACM Letters on Programming Languages and Systems 2(1-4): 59-69, 1993. Type: Article
Dec 1 1994
 Adaptive data structures for IP lookups
Ioannidis I., Grama A., Atallah M. Journal of Experimental Algorithmics 10(es): 1.1-es, 2005. Type: Article
Jan 13 2006
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Andoni A., Indyk P. Communications of the ACM 51(1): 117-122, 2008. Type: Article
Oct 15 2009
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy