Are automatically generated test suites better than manually written test suites? To answer this question, ten Java applications with existing manually written test suites were tested using the EVOSUITE and CodePro automated test generation tools. Branch coverage and mutation scores were used to assess the quality of test suites. The Jacoco and MAJOR tools were used to calculate these measures.
EVOSUITE covered 31.86 percent of branches on average and had an average mutation score of 39.89 percent. For the manually written test suites, the figures were, respectively, 31.5 percent and 42.14 percent. The authors conclude that their results should encourage use of a tool such as EVOSUITE for test production. CodePro’s test quality was found to be much lower and this was attributed to absent or weaker oracles.
Also investigated was the relationship between branch coverage and mutation score. By inspection, Figures 7 and 8 do indeed suggest correlations are present for EVOSUITE and the manually written test suites. It is unclear, however, what the actual correlation scores are. The investigators imply they calculated non-linear fits, but in Figures 7 and 8, linear lines are drawn.
The analysis presented has two major weaknesses. First, there is no discussion or treatment of equivalent mutants. There can be sizable changes in mutation scores when equivalent mutants are factored out. Second, there is no discussion or treatment of the degree to which branches actually tested overlapped with branches actually containing mutations.
Despite the shortcomings identified, this paper is strongly recommended to those working in software testing.