Computing Reviews

TravisTorrent:synthesizing Travis CI and GitHub for full-stack research on continuous integration
Beller M., Gousios G., Zaidman A.  MSR 2017 (Proceedings of the 14th International Conference on Mining Software Repositories, Buenos Aires, Argentina, May 20-28, 2017)447-450,2017.Type:Proceedings
Date Reviewed: 01/04/18

When it comes to software engineering, it is not new that industry plays an important (if not the most important) role in leveraging innovation in the field. But it is also true that some innovations lack a deep understanding of the benefits and costs of their usage. This is pretty much the case of continuous integration (CI) practices. This paper describes a large dataset, dubbed TravisTorrent, about Travis-CI, a CI platform for open-source software.

CI platforms automatically take source code changes, compile the code, and then progress through a pipeline of testing. This process generates a bunch of data to track how well a source code change goes through the tests. TravisTorrent synthesizes such data to unleash academic research on Travis-CI, giving opportunities to researchers to challenge traditional views on CI usage. In the MSR 2017 conference, TravisTorrent was chosen as the dataset for the yearly mining challenge [1], where participants used it to investigate a broad variety of research questions.

TravisTorrent has roughly three million build jobs from over 1000 open-source software projects. Such data was harvested by a set of open-source tools: Travis Poker, which checks whether a project has a Travis-CI build history; Travis Harvester, which downloads Travis-CI logs; Travis BuildLog Analyzer; and Travis Build Metadata Extractor.

Arguably, TravisTorrent is an important asset to help researchers offer critical insights about CI. However, further data curation could well be warranted. All data is distributed as compressed SQL, but no normalization was done. Hence, there are data redundancies that could misguide users. Another concern is about inconsistent states in several records. For example, the tr_log_tests_failed column should report the names of the tests that failed, extracted by build log analysis, but several failed build jobs report this column as empty, even when the tr_log_bool_tests_ran column, which tells that a test ran, is true.

Overall, TravisTorrent provides access to a rich dataset for CI researchers. I recommend this paper to anyone interested in Travis-CI synthesized data. Further, I hope the authors and researchers keep up the good work offering new and more accurate dataset snapshots.


1)

MSR Mining Challenge. http://2017.msrconf.org/#/challenge (06/27/2017).

Reviewer:  Klerisson Paixao Review #: CR145753 (1805-0244)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy