Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A characteristic study on failures of production distributed data-parallel programs
Li S., Zhou H., Lin H., Xiao T., Lin H., Lin W., Xie T.  ICSE 2013 (Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, May 18-26, 2013)963-972.2013.Type:Proceedings
Date Reviewed: Aug 27 2014

Soft failure is the unsuccessful premature termination of a data parallel program, as opposed, for instance, to any kind of hardware failure. This paper, the first of its kind, undertakes a systematic evaluation of soft failures in a big data system. The study examines a random sample of 250 soft failures and provides a classification of root causes, as well as some insight on debugging and fixes.

This work is interesting for at least two reasons: it establishes a peer-reviewed benchmark on soft failures that is valuable for comparison with internal investigations of similar scope, and it provides material from the trenches for initial criteria to validate coding and software life cycle management practices in a rising discipline (big data) where much confusion and no established history exist. For instance, programmer’s error in misspelling a column name was one of the prominent sources of production soft failures. This may be surprising with regard to a traditional relational database management system (RDBMS) environment, where production schemas are static and column numbers are relatively small. However, in a big data system, there may be thousands of column names and the schema constraints are more dynamic. In association with undocumented schema churn, the chain of events leading to this type of programmer’s error becomes easier to understand. Once understood, one can take safeguards against recurrence.

There are no groundbreaking results or findings from this work, but its novelty and incremental contributions are surely welcome.

Reviewer:  A. Squassabia Review #: CR142667 (1412-1056)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Metrics (D.2.8 )
 
 
Parallelism And Concurrency (F.1.2 ... )
 
 
Software Development (K.6.3 ... )
 
 
Modes Of Computation (F.1.2 )
 
 
Software Management (K.6.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Metrics": Date
A comparison of time domains for software reliability models
Musa J., Okumoto K. Journal of Systems and Software 4(4): 277-287, 1984. Type: Article
May 1 1985
On software equations
Král J. Information Processing Letters 19(4): 191-196, 1984. Type: Article
Jun 1 1985
Software metrics: establishing a company-wide program
Grady R., Caswell D., Prentice-Hall, Inc., Upper Saddle River, NJ, 1987. Type: Book (9789780138218447)
Apr 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy