Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
TravisTorrent: synthesizing Travis CI and GitHub for full-stack research on continuous integration
Beller M., Gousios G., Zaidman A.  MSR 2017 (Proceedings of the 14th International Conference on Mining Software Repositories, Buenos Aires, Argentina, May 20-28, 2017)447-450.2017.Type:Proceedings
Date Reviewed: Jan 4 2018

When it comes to software engineering, it is not new that industry plays an important (if not the most important) role in leveraging innovation in the field. But it is also true that some innovations lack a deep understanding of the benefits and costs of their usage. This is pretty much the case of continuous integration (CI) practices. This paper describes a large dataset, dubbed TravisTorrent, about Travis-CI, a CI platform for open-source software.

CI platforms automatically take source code changes, compile the code, and then progress through a pipeline of testing. This process generates a bunch of data to track how well a source code change goes through the tests. TravisTorrent synthesizes such data to unleash academic research on Travis-CI, giving opportunities to researchers to challenge traditional views on CI usage. In the MSR 2017 conference, TravisTorrent was chosen as the dataset for the yearly mining challenge [1], where participants used it to investigate a broad variety of research questions.

TravisTorrent has roughly three million build jobs from over 1000 open-source software projects. Such data was harvested by a set of open-source tools: Travis Poker, which checks whether a project has a Travis-CI build history; Travis Harvester, which downloads Travis-CI logs; Travis BuildLog Analyzer; and Travis Build Metadata Extractor.

Arguably, TravisTorrent is an important asset to help researchers offer critical insights about CI. However, further data curation could well be warranted. All data is distributed as compressed SQL, but no normalization was done. Hence, there are data redundancies that could misguide users. Another concern is about inconsistent states in several records. For example, the tr_log_tests_failed column should report the names of the tests that failed, extracted by build log analysis, but several failed build jobs report this column as empty, even when the tr_log_bool_tests_ran column, which tells that a test ran, is true.

Overall, TravisTorrent provides access to a rich dataset for CI researchers. I recommend this paper to anyone interested in Travis-CI synthesized data. Further, I hope the authors and researchers keep up the good work offering new and more accurate dataset snapshots.

Reviewer:  Klerisson Paixao Review #: CR145753 (1805-0244)
1) MSR Mining Challenge. http://2017.msrconf.org/#/challenge (06/27/2017).
Bookmark and Share
  Reviewer Selected
 
 
Data Mining (H.2.8 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy