The mean magnitude of relative error (MMRE) statistic is an unreliable criterion to use in the evaluation and comparison of software prediction models. Furthermore, previous studies that have used MMRE as a basis of model comparison may have misleading results. The authors perform a simulation that studies various evaluation metrics and their ability to select the best prediction model among competing models. MMRE is seen to be a poor performer. It demonstrates a bias toward models that underestimate, even in comparison with actual data. The authors conclude that empirical software engineering needs to better understand and draw upon current research in statistics to analyze the accuracy of software prediction models.
The authors have written a concise, well-reasoned paper that should be read by all researchers of software estimation models. An important implication of this paper is that evaluation metrics for software models must be carefully chosen. However, the authors find that no one metric stands out, and their best advice is that researchers must clearly justify the use of one or a combination of evaluation metrics in any comparison of software models.