The effectiveness of a program test suite can be measured by how many mutated versions of the program are detected that contain an injected defect. Mutations (defects) are injected through the application of simple rules, such as “negate decision” or “replace operator,” a process that is objective and repeatable. But are the detection rates for mutations representative of the detection rates for real faults? An experiment conducted to answer this question was based on one program with real faults, seven programs with hand-seeded faults, and existing comprehensive pools of test cases.
Mutants (mutated versions) were systematically created for each program. Then, 5,000 test suites of varying sizes were randomly selected from the existing comprehensive pool of test cases for each program, and the detection rates for mutants, real faults, and hand-seeded faults were determined.
Figures 5 and 6 in this paper, which show how detection rates vary by size of test suite, provide convincing evidence that mutations are similar to real faults in terms of detection difficulty, but that the hand-seeded faults are more difficult to detect. The latter result is explained by the fact that those responsible for the hand-seeded faults differentially filtered out faults that were easy to detect. The authors rightly conclude that the hand-seeding of faults is a subjective and undefined process, and that replication studies are needed to confirm their result that detection rates for mutations are representative of detection rates of real faults.
This paper is strongly recommended to those researching the effectiveness of software testing techniques.