The 29 pages of this paper provide a more detailed “what we did” report than usual for research papers. What the researchers attempted to do was to empirically assess the validity of the consistent programmer hypothesis. A common basis for that hypothesis is the observation that programmers usually act like authors, in general, with their usual idiosyncratic patterns of language use. (What distinguishes your favorite fiction author from other authors?)
More specifically, what the researchers in this paper set out to do was to distinguish C source code authorship by examining “style facets” used in written C programs. Since this paper was published in a testing journal, the paper also includes frequent mention of the relevance of the research to software testing.
The researchers recruited five experienced professional C programmers as volunteer programmer subjects for their research. Working independently from three written specifications, the five each wrote three C programs. The researchers analyzed those 15 programs by manually tallying features (such as counts of semicolons used), by subjecting them to static analysis by running the “lint” software tool and the automatic test analysis for C software tool, and by subjecting them to dynamic analysis by running the PISCES test analysis software tool. Additionally, the researchers acquired 60 student programs (each of 15 students from a class on networks wrote four C programs). The researchers analyzed those 60 student programs by the same processes they used in analyzing the 15 professional programs. In this paper, however, the researchers emphasize their findings from the professional C programs.
Using the data from their analyses, the researchers calculated some probabilities separately from the professional and the student programs, and flagged as significant any probabilities equal to or less than 0.05. However, most of these seem irrelevant to the statistical testing (covered in Section 5.4) of the researchers’ nine hypotheses (listed in Section 3). I could find no coverage of any testing of the paper’s title hypothesis. Section 5.2.1 in the paper presents a “discriminator for identifying programmers,” built in a manner like that of the Welker and Oman Maintainability Index, but does not present any statistical significance testing of the discriminator.
Overall, this paper has an interesting research design, good coverage of related work by others, and an impressive choice of references. The researchers have nicely enhanced the presentation of the software testing aspects of their work (to meet the needs of the journal) while still keeping what was probably the core of their original research report. Regrettably, the relatively small number of programmers involved limited the statistical power that could have enhanced their research findings.