Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Statistics for data scientists: an introduction to probability, statistics, and data analysis
Kaptein M., van den Heuvel E., Springer International Publishing, Cham, Switzerland, 2022. 348 pp. Type: Book (978-3-030105-30-3)
Date Reviewed: Jul 7 2022

One would hope that professionals calling themselves data scientists would have extensive training in both statistical theory and practice. Yet current data analytics curricula, while naturally including at least one general statistics course, often neglect the integration of those two essential components. This can and does sometimes lead to simplistic explorations, analyses, and modeling of complex datasets.

Kaptein and van den Heuvel teach statistics and data science at the Eindhoven University of Technology and at Tilburg University in the Netherlands, and have developed a new course and textbook for data scientists incorporating a more rigorous foundation in probability and statistics than found in many other popular data science texts.

The authors assume some prerequisite coursework in mathematics and programming, and have taught their course to undergraduate students in computer science, economics, and even social sciences. Their text focuses on the use of modern applied statistical methods and includes the extremely important yet often minimal coverage sampling. The book’s extensive examples and exercises use the R language and include numerous datasets for illustrating basic data concepts, sampling and estimation, probability, distributions, multivariate techniques, and Bayesian analysis. The authors’ website for the textbook (http://www.nth-iteration.com/statistics-for-data-scientist/) includes access to the sample datasets, R source code, and recorded whiteboard lectures.

Each chapter begins with a general introduction to the major topic and presents detailed analytical examples using R paired with the relevant theoretical concepts and formulae. There is much mathematical notation used, which might be a challenge for some readers withoutthe prerequisite backgrounds. One important chapter covers multivariate exploration and analysis of datasets and the concepts and measures of dependency and association for different data types. The final chapter on Bayesian statistics presents a readable and comprehensive discussion of that approach to estimation and decision-making, although entire libraries have been written on that topic. The authors nicely summarize and illustrate the differences between Bayesian and frequentist probability methods, yet admit that there is much more to learn about them.

Having taught data analytics at the introductory graduate level, I welcome the authors’ textbook as an essential resource for training well-grounded entry-level data scientists. As stated in the Data Science Association’s Code of Conduct [1], their first requirement is competence:

A data scientist shall provide competent data science professional services to a client. Competent data science professional services requires the knowledge, skill, thoroughness and preparation reasonably necessary for the services.

Training in both the theory and practice of data analytics is a requirement for such competence. The authors’ textbook definitely provides a valuable resource for such training.

Reviewer:  Harry J. Foxwell Review #: CR147468 (2209-0120)
1) Code of Conduct. Data Science Association. https://www.datascienceassn.org/code-of-conduct.html (accessed 7/6/2022).
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Probability And Statistics (G.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Probability And Statistics": Date
Probabilities from fuzzy observations
Yager R. (ed) Information Sciences 32(1): 1-31, 1984. Type: Article
Mar 1 1985
Randomness conservation inequalities; information and independence in mathematical theories
Levin L. Information and Control 61(1): 15-37, 1984. Type: Article
Sep 1 1985
The valuing of management information. Part I: the Bayesian approach
Carter M. Journal of Information Science 10(1): 1-9, 1985. Type: Article
May 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy