Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
On the impact of programming languages on code quality: a reproduction study
Berger E., Hollenbeck C., Maj P., Vitek O., Vitek J. ACM Transactions on Programming Languages and Systems41 (4):1-24,2019.Type:Article
Date Reviewed: Nov 21 2019

E. W. Dijkstra’s classic A discipline of programming [1] has rightly achieved the sort of permanent validity that mathematician G. H. Hardy spoke of in his famous A mathematician’s apology [2]. Dijkstra’s preface explains--to the satisfaction of many computing scientists, and certainly to my own--why he did not choose a real, existing programming language (or subset thereof) in which to express his algorithms. He goes on to imply, in tongue-in-cheek new math jargon, that “powerful” language features most likely “belong to the problem set” rather than the “solution set.”

This paper addresses, with high empiricism and thorough heterogeneous methodology, the relationship of programming errors to the programming languages of implementation. It is a detailed rework--and hence response--to a widely influential 2014 paper [3] on this subject that used data from 729 projects hosted on GitHub and covered 11 programming languages. The present authors’ initial intention was to repeat the 2014 experiment on all 11 languages and reanalyze the same data independently, but “missing code and problems with the classification of languages” led to use and analysis of a subset of the original projects. The paper describes the processes and degrees of confirmation, or lack thereof, of the following assertions (necessarily quoted exactly and called research questions):

(RQ1) “Some languages have a greater association with defects than others, although the effect is small.”
(RQ2) “There is a small but significant relationship between language class [procedural, functional, script] and defects.”
(RQ3) “There is no general relationship between domain and language defect proneness.”
(RQ4) “Defect types are strongly associated with languages.”

Though not explicit in RQ1 through RQ4, programming language typing--strong, weak, static, dynamic--figured in the language comparison.

The two articles (this paper and [3]) should be read in tandem so as to be clear regarding the large and evidently honest effort that went into each. The authors of this paper thank some of those of the first for “sharing [whatever was preserved of] the data and code.”

The determination of the present paper, however, is that “the conclusions of the [2014] paper do not hold.” Repetition and reanalysis were the efforts of the current paper. For repetition, RQ1 “produced small differences but qualitatively similar conclusions.” The outcome for RQ2 was different for two of five categories. Missing code and “differences in the data” prevented RQ3 and RQ4 from being repeated. Reanalysis “failed to validate” the previous results for seven of the 11 languages, with the remaining four retaining small “practical significance.” The current paper also points out that a large number of readers inferred language/error causality from the 2014 results rather than the appropriate attribution to correlation.

One thing that jumps out at the reader: whatever differences from the original study that were established, these differences were “small” and “modest.” Here is an instance of dealing with big data, complete with the inevitable description of sausage making (my metaphor) in the form of reclassification and reverse engineering, which I see as valuable real-world reading. In this regard, the sections “Grep Considered Harmful” and “Be Wary of P-Values” are instructive and apposite. It is clear that confounding factors external to the statistical methodology had a nontrivial effect.

At the end, the authors “reiterate the need for automated and reproducible studies,” so as to reduce the “enormous” amount of work and consequent possibility of errors.

Reviewer:  George Hacken Review #: CR146790 (2004-0077)
1) Dijkstra, E. W. A discipline of programming. Prentice-Hall, Englewood Cliffs, NJ, 1976.
2) Hardy, G. H. A mathematician’s apology. Cambridge University Press, Cambridge, UK, 1940.
3) Ray, B.; Posnett, D.; Devanbu, P. T.; Filkov, V. A large-scale study of programming languages and code quality in GitHub. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering ACM, 2014, 155–165.
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
General (D.2.0 )
 
 
Debug (D.2.5 ... )
 
 
Validation (D.2.4 ... )
 
 
Software/ Program Verification (D.2.4 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Development of distributed software
Shatz S. (ed), Macmillan Publishing Co., Inc., Indianapolis, IN, 1993. Type: Book (9780024096111)
Aug 1 1994
Fundamentals of software engineering
Ghezzi C., Jazayeri M., Mandrioli D., Prentice-Hall, Inc., Upper Saddle River, NJ, 1991. Type: Book (013820432)
Jul 1 1992
Software engineering
Sodhi J., TAB Books, Blue Ridge Summit, PA, 1991. Type: Book (9780830633425)
Feb 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy