A systematic literature review was conducted to identify the difficulties of testing scientific software. A keyword-based search returned more than 6000 hits. Filtering criteria, which mainly focused on titles and abstracts, reduced the number of studies analyzed to 62. Difficulties were identified with the development of test cases, the production of expected test-case output (the oracle problem), the execution of test cases, and the interpretation of test results.
The most common approach to tackling the oracle problem was to rely on an independently developed program that fulfilled the same specification (nine studies). One of the more interesting ways reported of tackling the oracle problem was the use of metamorphic testing, where expectations are made of how a change in input should change the output (five studies). Surprisingly, professional judgment was also used to assess whether output was satisfactory (four studies). Relatively little use was made of recognized testing methods. Only eight studies reported the use of unit testing and only five studies reported the use of regression testing. Test coverage information was found to have been used in only two studies. The investigators conclude that the software engineering community needs to pay more attention to the oracle problems of scientific software and that scientists who act as developers need to incorporate recognized testing methods into their software development processes.
This literature review was well conducted and provides many useful insights into the testing of scientific software. This paper is strongly recommended to anyone with an interest in scientific software.