Traditionally, science and engineering involve a fair amount of empirical work; it is, after all, one of the foundations of the scientific method. Until recently, there was rather a dearth of empirical work in both computer science (CS) and software engineering (SE). Even rather simple questions, such as what effect (if any) static types have on basic activities such as software maintenance, were unanswered. Oh, there is no lack of opinion on the matter! But even opinions of highly learned professionals are no replacement for empirical studies.
Sadly, the vast majority of software professionals, whether in industry or in academia, are not well-versed in experimental protocols. While the paper at hand is not a survey or tutorial, it goes to great lengths to expand on all the details involved in running proper experiments (such verbosity would never be tolerated in empirical domains like psychology). For CS and SE, this becomes a strength of this long paper.
The experiments presented here try to ascertain if there are software maintainability tasks where static typing helps (or hinders). Various experiments were very carefully designed, with all choices fully explained, to determine this. For example, the choice of using Java and Groovy is fully laid out, as well as why other valid choices were rejected. The results are then painstakingly analyzed, including a rather thorough analysis of the threats to validity and of limitations of the conclusions.
Roughly speaking, the results are as expected: if the types encode useful information for understanding a particular piece of code, then maintaining that code is easier with types than without. For tasks where the type system “does not help,” then there is no (statistical) difference, which of course begs the question of how much more useful types can be in practice for programs written in languages such as Haskell and Scala than in Java. And it is important to note that these conclusions all regard maintenance tasks, leaving the debate about exploratory programming (the writing of new code for a new application never done before) and type systems fully open.
Anyone curious about empirical methods in CS or SE could benefit from reading this paper: the thoroughness is exemplary. Anyone wishing to understand the authors’ results will find the paper exceedingly long, and might wish for an executive summary instead.