Software engineers have long believed that software complexity and maintenance costs are strongly related. This paper discusses the empirical evidence for the existence and nature of the relationship. It then describes the results of a carefully constructed statistical analysis of 65 maintenance projects on 17 large COBOL applications. The analysis is based on a statistical model of maintenance costs that includes seven project factors, such as number of lines of code changed or added, programmer skill, use of structured analysis or design, and three complexity measures. The experimenters attempt to verify or reject three hypotheses: “controlling for other factors known to affect software maintenance costs, the costs will depend significantly on” (1) the average size of a module’s procedures, “with costs rising for applications whose average procedure size is either very large or very small,” (2) the average module size, “with costs rising for applications whose average module size is either very large or very small,” and (3) the incidence of branches beyond paragraph boundaries, “with costs rising with increases in the incidence of branching.”
The statistical model is an equation for cost that is linear in seven project factors, includes both a first-order and a second-order term for procedure size and for module size, and is linear in branching complexity. An ordinary least squares regression is used to fit the model. The hypotheses are tested by using an F-test to decide whether one can reject the null hypotheses that the regression coefficients of the complexity terms are zero, and by examining the signs of the coefficients of the linear and squared terms. The results confirm hypothesis (1); partly confirm hypothesis(2), since the regression shows that costs rise with average module size, but show only a linear effect; and confirm hypothesis(3).
The results are encouraging because they are based on a large sample and they attempt to control for a number of possible confounding factors, so this work is a good model for one approach to empirical software engineering studies. They also provide quantitative data that are useful for software managers working in the environment under study and possibly in other COBOL environments. The results are discouraging because they do not appear to be generalizable to languages other than COBOL and because they focus on measures of code complexity; they provide little guidance for the software designer on how to organize his or her software into modules or on what strategy to use to make a design that is robust with respect to both performance and change.