Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Husky: towards a more efficient and expressive distributed computing framework
Yang F., Li J., Cheng J. Proceedings of the VLDB Endowment9 (5):420-431,2016.Type:Article
Date Reviewed: Feb 7 2017

Husky is a continuity point on the scientific approach to massive data computing, an approach that is still at the peak of hype. This approach certainly would be the origin of further investigation to provide a more effective framework to gain insights into huge datasets with data parallel mechanisms.

In these platforms, the environment is divided into two sections: a functional or declarative programming interface and the underlying system, which handles the requests with connections to faraway distributed huge data sources, with consideration of fault tolerance and load balancing techniques. The platforms can be divided into two categories: general purpose, like Spark, Dryad, and Hadoop, and domain specific, like Pregel, GAS, and the Parameter server framework. Husky, an open-source system, with the aim of fine-grained control over the utilization of the distributed resources and the design of more efficient algorithms, tries to create a better balance between high performance and low development cost with in-memory large-scale data mining based on shared nothing architecture over a cluster of machines.

The system was founded based on these entities: machine, worker, and object (local and global). At any cluster of machines, a master manages the workers; the workers contain objects that are independent and items that can be migrated with data racing, socket programming, and data layout capabilities. The objects cooperate with the push, pull, and migrate primitive operations in a defined object interaction pattern. The operation can be performed in a synchronous (pipelined) or asynchronous manner. Object migration facilitates load balancing, whereas consistent hashing paves the way for the implementation of fault tolerance in the platform by hash ring connections among the workers.

Plenty of theoretic subjects are mentioned in this rich tutorial concerned with techniques to manage the operations to achieve performance and efficiency. It has great insights for developers who want to create similar systems.

The experimental evaluation results on bulk workload, graph analytics, machine learning, and the Wikipedia pipeline, with consideration of fault tolerance, load balancing, and scalability, indicate that Husky outperforms other similar platforms. The paper is a description of an open-source platform and is intrinsically significant and admirable, although the semi-centralized structure of the system and anarchies between the global and local objects could be issues.

Reviewer:  Mohammad Sadegh Kayhani Pirdehi Review #: CR145045 (1705-0274)
Bookmark and Share
  Featured Reviewer  
 
Distributed Architectures (C.1.4 ... )
 
 
Data Mining (H.2.8 ... )
 
 
Distributed Databases (H.2.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Distributed Architectures": Date
Distributed and parallel computing
El-Rewini H., Lewis T. (ed), Manning Publications Co., Greenwich, CT, 1998. Type: Book (9780137955923)
Mar 1 1999
In search of clusters (2nd ed.)
Pfister G., Prentice-Hall, Inc., Upper Saddle River, NJ, 1998. Type: Book (9780138997090)
Nov 1 1998
A correctness condition for high-performance multiprocessors
Attiya H., Friedman R. SIAM Journal on Computing 27(6): 1637-1670, 1998. Type: Article
May 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy