A software test set size is a prime factor in determining the cost and effectiveness of testing. If the test set size is too small, the result is faults escaping detection. Conversely, too many tests waste effort creating and executing tests. Can the number of tests needed for full fault detection be estimated? The authors advocate using fault failure rates to make a probabilistic estimation of the number of test cases needed.
A formula is developed using test failure rates that can be evaluated numerically for successive values of the test set size until the desired level of fault detection (up to 100 percent) is achieved. This formula is evaluated against 11 programs (the Siemens suite plus space, grep, gzip, and make). Its predicted effectiveness is always within five percent of the observed effectiveness for test set sizes, from one to the full test set size (ranging from 211 to 13,585 test cases).
In real life, it is difficult to know fault failure rates, as was the case in these experiments. What happens if fault rates can only be estimated and are therefore inaccurate? The authors repeated their experiments using inaccurate fault rates, and they claim the model was still accurate. While this claim is true using their data, it is unconvincing since their modified fault rate values differed at most by five percent from the actual values. What happens if the fault rates are off by ten, 25, or even 100 percent?
The results show that the effective test set size can be reduced by 70 to 80 percent for five of the programs and by 40 percent for three others. For the last three, the full test set is still needed. These results also invite skepticism. The 11 programs and test suites are strongly bimodal. Seven of the programs are less than 1,000 lines long, but they have test set sizes greater than 1,000 (from 1,052 tests for a 406-line program to 5,542 tests for a 563-line program). These test suites are clearly much larger than necessary.
Three of the programs are thousands of lines long with small test sets (from 211 tests for a 6,573-line program to 793 tests for a 20,014-line program)--these would just as obviously seem to require the full test suite to achieve 100 percent fault detection.
The paper is well written and should be of interest to researchers in software testing.