Computing Reviews

Data science and analytics with Python
Rogel-Salazar J., Chapman & Hall/CRC,Boca Raton, FL,2017. 412 pp.Type:Book
Date Reviewed: 06/01/18

Data science and analytics can be key to improving many things: marketing, understanding, systems modeling, and predictions (from race horses to presidents). And you don’t need to be an expert to use good tools and improve your understanding, even if only a little.

The tools discussed in this book can help readers use Python and Pandas (a data management framework), Numpy (for those number-crunching needs), and Scikit-learn (a collection of machine learning algorithms). The book is more of an introduction to these tools, including how to use them together, than an exhaustive coverage of the field (it’s not even clear that such a thing could fit in a single volume).

There are nine chapters (with nicely whimsical names):

(1) “The Trials and Tribulations of a Data Scientist”: covers data science and data scientists and talks briefly about the kinds of workflow that may be useful.
(2) “Python: For Something Completely Different”: a brief introduction to Python and Numpy.
(3) “The Machine that goes ‘Ping’”: covers some basics of machine learning, including an introduction to Scikit-learn.
(4) “The Relationship Conundrum”: regression of various sorts and its pitfalls.
(5) “Jackalopes and Hares”: clustering, mostly focusing on k-means.
(6) “Unicorns and Horses”: classification, including k-nearest neighbors and a touch of Bayesian classification.
(7) “Decisions, Decisions”: hierarchical clustering, decision trees, and ensemble methods.
(8) “Less Is More”: dimensionality reduction, principal component analysis, and singular value decompositions.
(9) “Kernel Tricks up the Sleeve”: support vector machines.

The format is nicely laid out with the major text in one column and notes/references in a parallel column. While this uses a bit more paper, it makes it easy to associate references and notes with the text with a corresponding rise in readability.

There are nice examples in most of the chapters, which are clearly explained, and all the basic code is given. The datasets are available online, so it is easy enough for readers to try the examples as they are given.

The book is very practical, and the theory behind some of the methods is not presented in any detail. This isn’t really a problem as there are enough pointers to further reading for those interested, but a cursory familiarity with some of the theory is likely to make things a bit more palatable.

I’m not sure I’d recommend this book as a primary text, but it could be good supplemental reading for a course covering more of the theory, as it provides nice examples. It is more likely that interested practitioners could pick this up and in a few days learn enough about data science to be productive contributors. A downside (though not a major one) is that the code is not made available in a Jupyter Notebook for interactive exploration, which could provide not only the example code given, but suggestions for other experiments to be built/run by the reader.

More reviews about this item: Amazon

Reviewer:  Jeffrey Putnam Review #: CR146059 (1808-0415)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy