Released software products are pretty much expected to work under a user’s environment. However, the many combinations of different operating systems, runtime environments, and hardware architectures where the software ought to work is putting more and more weight on software developers’ shoulders. Zolfagharinia et al. show that not every environment yields reliable software build results for the comprehensive Perl archive network (CPAN), a large collection of Perl software and documentation.
Compatibility issues are expected when building software across multiple platforms. Hence, modern software engineering relies on continuous integration (CI) practices to build and test the software along its development against numerous environments. Usually, a CI build result is binary: either it is a success or it just failed. As the study reveals, interpreting build results is not as straightforward as it seems. The same software change can issue different failure reasons for different environments, challenging developers to quickly diagnose and fix these failures.
Further, intrigued by the extent to which environments can cause build failures, the authors developed a prediction model that only accounts for operating system and runtime environment features. Surprisingly, it predicts with reasonable accuracy the result of a build. The experiment used ten-fold cross validation to test the prediction model, which seems to be an impractical approach given the time-dependent nature of the data. This evaluation technique randomizes the data within the folds. Therefore, we end up with folds that contain build data that happened ahead of the current build whose result we are trying to predict. In simple words, future information is used to predict the past. In this case, online evaluation scenarios are more appropriate for the task.
Generally, this paper would be of interest to researchers in software engineering. More specifically, it provides a good rationale to select the build results.