Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Perspectives on data science for software engineering
Menzies T., Williams L., Zimmermann T., Morgan Kaufmann Publishers Inc., San Francisco, CA, 2016. 408 pp. Type: Book (978-0-128042-06-9)
Date Reviewed: Apr 13 2017

Data science is a very hot topic. Case in point, we recently hired two specialist data scientists at my company. But data science is not something you pick up by running the latest data mining tool on your big data repository; rather, it is about arriving at hard-won insights gleaned from years of experience, perseverance, and hard work. So what more could you want than a book written by 77 contributing authors about their experiences, the mistakes they made, and the lessons they learned, while cataloging what worked and what did not work for them? But wait. Before you jump onto your favorite book acquisition website thinking this book has everything you need to know about data science, note that it focuses exclusively on data science for software engineering. In other words, how can data science improve the software development process? This is a good thing, because trust me, you would not be able to lift a book containing everything there is to know about the discipline.

At 69 chapters and an 800-word review limit, I unfortunately have to be very brief. The book’s editors take the first chapter to tell us how the book came about. Zhang et al. share their experiences mining callstack traces, while Menzies explains seven principles of software engineering. Russo compares the search for patterns in data to what is performed in diagnostic medicine, and Whitehead details the importance of theory, a topic further expanded by Sjøberg et al.

Zeller describes work mining apps for behavioral anomalies and thus the ability to detect malicious apps. Ranganath used USB driver artifacts to narrow down testing requirements, while Nagappan and Shihab mined app store artifacts to assess app quality. Barr and Devanbu use cross-entropy to show how source code is much more predictable than natural language. Rotella predicts release readiness based on cumulative and weekly bug reports, and Lin et al. use online incident reports to speed diagnosis and repair. Fritz discusses the possibility of measuring individual productivity while Theisen and Williams use stack traces to identify potential vulnerabilities to malicious attack. Huo et al. discuss the importance of visualization, and Huang details how cohort grouping aids analysis of gameplay data. Bener et al. detail their collaborative efforts with industry partners, Hering the benefits of targeted testing, and Hindle the ease with which false conclusions can be arrived at. Weyuker and Ostrand look at identifying fault-prone files, Baysal highlights the benefits of individually customized issue tracking, while Ruhe and Najebi explain that it is the decisions based on data, rather than the data itself, that are important. Ray and Posnett look into whether the choice of programming language impacts code quality, and Czerwonka shows that finding defects is only a minor benefit of code reviews.

Next follow several chapters on data science techniques: Bird on conducting interviews, Holmes on how to find state transitions in temporal data, Zimmermann on using card sorting, and Spinellis on the benefits of building tools. Dybå et al. discuss analyzing evidence data, Minku explains how to choose the machine learning method to use, and Bacchelli explains how to summarize unstructured data, with Guo providing tips on preparing data. Wagner describes the challenges of natural language processing, and Budgen how to provide evidence using data. Bener and Tosun describe Bayesian networks, Peters covers the importance of balancing privacy and data sharing, and Minku appraises the benefits of combining predictive models to create ensembles. Penta encourages us to augment quantitative data with qualitative information, which Barik and Murphy-Hill extend by describing successful survey design.

The final chapters attempt to distill wisdom from the wise. Murphy encourages us to log everything. Godfrey discusses providence of software artifacts, Gousios encourages us to make our tools open, Carnahan cautions us to answer questions while the answer still has value, and Kim gives us steps to follow to deploy data science in our organizations. Adams talks about using version control systems to link defect reports to a release, Meneely discusses the challenge of measuring security, and Just and Herzig cover problems encountered mining bug reports. Diehl and Runeson advocate the use of data visualization while Orso cautions us not to forget the developers whom we are attempting to help.

Next, Murphy hits hard against publications and their paper selection criteria. Meneely tells us to focus on actionable metrics, and Shepperd explains the need for studies that replicate results, reiterated later by Juristo. Valdivia-Garcia and Nagappan explain the need for selecting a diverse set of software projects for research. Bird cautions us against bias in our results, Menzies against being too quick jumping to conclusions, and Medvidovic and Orso against conclusions that have little value. Mockus discusses data scrubbing, Robbes addresses the challenges of dealing with small companies, and Vegas and Juristo list all that can go wrong in experiments. Zimmermann explains that a successful model won’t work for everybody, Ranganath that a simple explanation trumps a good model, and Prechelt that we should learn from failed expectations. Turhan and Kuutti extol asking simple questions while Münch encourages frequent hypothesis testing. Storey warns of the dangers of data torture while Ko tells us to use every bit of data we have.

Reviewer:  Bernard Kuc Review #: CR145197 (1706-0334)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
General (D.2.0 )
 
 
Data Mining (H.2.8 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Database Applications (H.2.8 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Development of distributed software
Shatz S. (ed), Macmillan Publishing Co., Inc., Indianapolis, IN, 1993. Type: Book (9780024096111)
Aug 1 1994
Fundamentals of software engineering
Ghezzi C., Jazayeri M., Mandrioli D., Prentice-Hall, Inc., Upper Saddle River, NJ, 1991. Type: Book (013820432)
Jul 1 1992
Software engineering
Sodhi J., TAB Books, Blue Ridge Summit, PA, 1991. Type: Book (9780830633425)
Feb 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy