Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Zimmermann T., Nagappan N., Gall H., Giger E., Murphy B.  ESEC/FSE 2009 (Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Amsterdam, the Netherlands, Aug 24-28, 2009)91-100.2009.Type:Proceedings
Date Reviewed: Dec 29 2009

For the last 20 years, software professionals have sought the “silver bullet” of testing: the ability to predict software defect rates for the next project. Researching this goal from a cross-project perspective seems innovative and promising. At the least, this research may reveal new measures and techniques for defect predictions for the next version of the same project. At the most, it may reveal first-time prediction measures and techniques for cross projects, as well as same projects.

While the authors’ research goal is laudable, the assumptions and approach fall wide of the mark. Zimmermann et al. meticulously analyze all of the development data available to them, except for the actual defect data itself. Specifically, no true defect analysis is reported for any of the projects under review. Proper defect analysis includes the number and frequency of defects discovered by development phase and by date the defects were discovered in post-development (release, deployment, distribution, and so on), clustering of defects over time by test discovery characteristics (root cause analysis), and defect severity. The authors offer only “the probability that a particular software element (such as a binary) will fail in operation in the field”--no defect quantification, no defect characterization, no root cause prediction, and no project or end-user impact caused by the defect.

The authors’ failure to find cross-project defect predictors is not surprising; consider an analogy with an automotive consumer group’s failure to compare the gas mileage of two different cars when they opt to not drive (test and report) either car. The consumer group opts instead to measure the cars’ dimensions and weights (analogous to the authors measuring code KLOCs); note each car’s engine characteristics (analogous to the authors measuring code cyclomatic complexity); note each car’s manufacture information (analogous to the authors contrasting company development process differences); and note each car’s model (analogous to the authors comparing domain similarities). Just as gas mileage can be best determined by driving the cars a measured distance with a measured amount of fuel, defects can be best predicted by discovering repeated patterns in development process flaws and the most effective testing method that reveals them.

Zimmermann et al. apparently have access to rich sets of project data that could yet reveal the cross-project defect predictors that they seek. I encourage them to revisit their project data and perform analyses that are focused much more on actual defect metrics, both individually and collectively, to achieve their goal. That better focus might be achieved in several ways.

One option is to give first preference to applications in different companies that are developed using very similar development processes. Pieces of software developed by radically different methods, even in the same company, tend to have radically different defect profiles. Different development methods present different opportunities to introduce different kinds of flaws in design and coding.

They could also ensure that first-preference applications in different companies are tested using a very similar testing process. One way to ensure that the testing process is similar is to choose companies that employ certified testers.

Another option is to give first analysis preference to actual post-development defect counts and characteristics instead of pre-development defect counts and characteristics. Post-development defects have the largest economic impact on both the software developer (cost of correction) and the end user (cost of loss of business). For the surprising magnitude of the cost difference between pre- and post-development defect correction for software development companies, see Boehm and Basili’s article [1].

Finally, they could use the existing study’s software development data--KLOCs, code churn, and application domain--as secondary, refining considerations after actual defect data comparisons have yielded positive preliminary results.

Reviewer:  Gerald Everett Review #: CR137599 (1102-0184)
1) Boehm, B.; Basili, V.R. Software defect reduction top 10 list. Computer 34, (2001), 135–137. http://doi.ieeecomputersociety.org/10.1109/2.962984.
Bookmark and Share
 
Performance Measures (D.2.8 ... )
 
 
Process Metrics (D.2.8 ... )
 
 
Software Quality Assurance (SQA) (D.2.9 ... )
 
 
Management (D.2.9 )
 
 
Metrics (D.2.8 )
 
Would you recommend this review?
yes
no
Other reviews under "Performance Measures": Date
Applied software measurement
Jones C., McGraw-Hill, Inc., New York, NY, 1991. Type: Book (9780070328136)
Aug 1 1992
The measurement of locality and the behaviour of programs
Bunt R., Murphy J. The Computer Journal 27(3): 238-253, 1984. Type: Article
Feb 1 1985
Estimating the fault rate function
Jennings T. IBM Systems Journal 31(2): 300-312, 1992. Type: Article
May 1 1994
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy