The authors discuss the empirical evaluation of general-purpose automated theorem proving systems, and present two schemes for empirical evaluation of fully automatic, sound, possibly incomplete, or bugged automated theorem proving systems for first-order classical logic, in terms of their ability to provide an assurance that a solution exists.
The first scheme evaluates automated theorem proving systems according to their ability to solve problems. The second scheme evaluates automated theorem proving problems. The methodology can be applied in the context of problems that are reasonably homogenous with respect to the systems. The scope of applicability of the evaluation schemes is clearly stated. In particular, the parameters relevant to the evaluation schemes are specified, the criteria used are explained, and the principles by which automated theorem proving systems are compared are indicated.
The problem addressed is relevant, as the evaluation methodology can be used to determine which systems are best suited to a specific application, and which techniques work well for different kinds of problems. The paper constitutes an interesting contribution to the field.
The paper is well written and organized. The evaluation schemes proposed are clearly described, and the resources used are indicated. The bibliographical references give an idea of the literature in the field. The intended readership is that of researchers studying automated theorem proving systems, and, specifically, automated theorem proving problem evaluation.