Computing Reviews

Data science and analytics with Python
Rogel-Salazar J., Chapman & Hall/CRC,Boca Raton, FL,2017. 412 pp.Type:Book
Date Reviewed: 02/23/18

Data science is an applied science devoted to extracting useful knowledge from various sources of data generated by modern computing. Data scientists are currently in demand. Python is a programming language that is widely used in education and in the sciences.

Although the R language is the main professional language in data science, given its strengths for statistical computing and graphics, Python is much more commonly taught and very widely used in the sciences. Python is also easier to learn initially than R and has some excellent libraries for data science. The author recommends the Anaconda Python distribution, which is an easy-to-install and popular data science platform, freely available for Windows, macOS, and Linux. The use of Python, instead of R, removes a barrier to learning data science.

The first chapter entertainingly explains for a wide audience what data science is and what a data scientist does.

The longest chapter is on Python. It provides a solid review of the basics of programming in Python and shows how to use the NumPy, Pandas, SciPy, and Mathplotlib libraries. The Scikit-Learn and StatsModels libraries are used in other chapters. Code snippets illustrate the use of Python and the libraries throughout the book. The data used for examples is readily available.

There are chapters on machine learning and pattern recognition, clustering, hierarchical clustering, dimensionality reduction, and support vector machines, as well as a short section on Scikit-Learn pipelines.

The author conveys the concepts and ideas behind the algorithms used, without emphasis on their implementation. Rather, algorithms are viewed as tools to be used, provided in the appropriate Python libraries.

The book is well bound, clearly printed, and easy on the eyes. Notes and captions are placed in wide right margins. There is a table of contents, a list of figures with their captions, a list of tables with their descriptions, a preface, a reader’s guide, and a bibliography. Each chapter includes a summary. The book is written with the goal of being accessible to readers from a wide variety of backgrounds. The author often includes amusing and informative but tangential remarks, which may be slightly distracting for some readers.

The book should have been more carefully edited. Among the problems are some misspellings, non-sentences, and other grammatical issues. Some sentences are poorly constructed. In the chapter on Python, the term assignation is used consistently instead of assignment. The formula for Manhattan distance in the chapter on machine learning and pattern recognition has misplaced absolute value indicators. Sometimes the term eigenvector is used when eigenvalue is meant.

To benefit the most from this book, the reader should be comfortable programming in Python and have a good grasp of a number of areas of mathematics, including topics in mathematical statistics and topics in linear algebra, such as eigenvectors, eigenvalues, singular decomposition of matrices, and principal component analysis.

For those individuals who want to know about data science, but do not yet program or do not have the appropriate background in mathematics, chapters 1 and 3 are recommended. Combined, these give an overview of what a data scientist is, and the essential topics of machine learning and pattern recognition. Python programming knowledge and some specific mathematics knowledge is needed to follow much of the material in most chapters.

Overall, in spite of its flaws, this is a worthy book on data science.

Reviewer:  David Naugler Review #: CR145877 (1805-0205)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024™
Terms of Use
| Privacy Policy