Test-driven development (TDD) reduces defect density by at least a factor of two at the expense of increasing coding time by 15 to 35 percent. Finally, we have measures on which to base project management decisions. Or do we? The measures derive from two industrial case studies at Microsoft called project A and project B. Defect densities from these two projects were compared with defect densities from comparable projects that employed a non-TDD methodology. Managers supplied the estimates of how much employing TDD had increased coding time.
However, we do not know about the operational profiles of the software. Would using the comparable projects more extensively have resulted in the discovery of more defects? Also, we do not know about other software quality assurance activities. What did the test teams do? Developer ability can vary considerably. Yet, project A had six developers, while its comparable project had two. Complexity typically correlates with lines of code (LOC). Yet, project B was one-fifth the size of its comparable project in terms of source LOC. Finally, we do not know about defect severity. Do measured improvements rely on the inclusion of minor defects?
The notion of comparable project used by the authors is fatally flawed, and too much is unknown about the data. The reader cannot simply accept that TDD reduces defect density by at least a factor of two. This paper will only interest those researching test-driven development.