Static analysis tools have been used to detect pre-release defects at Microsoft for six years. More than 12 percent of the pre-release defects fixed in Windows Server 2003 were found with the PREfix and PREfast static analysis tools. This paper uses historical data to determine how well statically found defects can predict the pre-release defect density, as measured by defects found by all other pre-release methods.
Data was analyzed at the component level for over 199 components of Windows Server 2003 (22 million lines of code). Employing the technique of data splitting, random samples of 132 components were used to build regression models whose predictive ability was assessed on the remaining 67 components. Figure 3 shows how the estimated defect density tracks the actual defect density for three random samples. A discriminant analysis is said to correctly identify 165 of the 199 components (82.91 percent) as fault, or not fault, prone.
An omission is the failure to report the false positive rates for PREfix and PREfast. An indication is given that some false positives might have been entered into the defect database. While Figure 3 demonstrates prediction tracking in general, at least three components with much higher defect densities appear that are not tracked by regression modeling. Why were these particular components so much worse? Could other techniques, for example, software metric approaches, have predicted that these components were very fault prone? We do not know. This paper is recommended to those working in software quality assurance.