Software engineering aims, among other goals, to improve quality and contain costs of software; accordingly, this paper is a small, incremental contribution. It claims novelty in asserting formally a correlation between the amount of logging in certain code and the amount of defects in that code. Albeit the association could appear intuitive, this paper employs original metrics and statistical techniques to provide a quantifiable justification to the claim, and conditionally succeeds in that intent. It enumerates the conditions under which the claim is made: validity applies for the studied software (certain releases of Hadoop and JBoss) and, one can argue, for similar code even if “similar” becomes a qualifier difficult to define, and several lesser constraints originating from internal methodology.
This work correctly asserts that correlation does not mean causality, and the value of the outcome is in focusing expensive resources (say, code reviews) toward code sections more error prone than others. An intrinsic weakness of this statistical study is its inability to associate causality to its results. For instance, the work indirectly admits that a structural code change between releases had the effect of confusing the value of extracted metrics to the extent of statistical insignificance. One can argue that confusion sufficient to invalidate the results may ensue also from simple practices such as the procedural adoption of code-proofing tools (say, FindBugs or equivalent) or any among a panoply of other software engineering techniques. On a minor level, the paper is unnecessarily repetitive and verbose.