Computing Reviews

The data science design manual
Skiena S., Springer International Publishing,New York, NY,2017. 445 pp.Type:Book
Date Reviewed: 02/23/18

The 14 chapters of this book have been carefully devised to provide a comprehensive introduction to data science as an academic discipline. The special feature of this text is that it does so by focusing on the skills and principles needed to design systems for collecting, analyzing, and interpreting data. The contents of the book as expressed by the chapter structure reflect contributions from computer science, statistics, and artificial intelligence (in particular, machine learning).

The first chapter is devoted to discussing what data science is. The discussion includes several interesting and inspiring insights. Most of them are based on identifying fundamental differences between computer science or software engineering and data science. Real scientists are data driven, while computer scientists are method driven. This is one of the theses the author formulates to justify and describe the new discipline. The chapter could not provide a definitive answer to the question of whether data science should be generally accepted as a new scientific discipline, but it is worth reading for anyone interested.

Chapter 2 provides mathematical preliminaries, in particular probability, statistics, correlation analysis, and logarithms. Anyone contemplating how to write about such a wide scope of those preliminaries in 30 pages would realize the immense difficulties involved. It is obvious that this is not a book on either probability or statistics. It is not possible to develop probability and statistics results here like those that can be found in any standard textbook on those disciplines. It is only possible in 30 pages to present and explain some related results. The author has mastered this brilliantly in the remaining chapters: “Data Munging,” “Scores and Rankings,” “Statistical Analysis,” “Visualizing Data,” “Mathematical Models,” “Linear Algebra,” “Linear and Logistic Regression,” “Distance and Network Methods,” “Machine Learning,” and “Big Data: Achieving Scale.”

This approach might justify calling the book a manual. In my humble opinion, however, the book is more than a typical manual. In fact, the author himself designates it as a textbook for an introductory course on data science. The chapters are richly equipped with exercises. The topics are always explained starting with a proper motivation and continuing with practical examples. This is perhaps the most outstanding feature of the book. It can serve as a regular textbook for an academic course. In fact, I should like to recommend it exactly for this purpose. On the other hand, it provides a wealth of material for people from industry, such as software engineers, and can serve as a manual for them to accomplish data science tasks. It should be noted that the book is not just a text, but a much more complex product, including a full set of lecture slides available online as well as a solutions wiki.

More reviews about this item: Amazon

Reviewer:  P. Navrat Review #: CR145880 (1805-0207)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy