Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Python for probability, statistics, and machine learning
Unpingco J., Springer International Publishing, New York, NY, 2016. 276 pp. Type: Book (978-3-319307-15-2)
Date Reviewed: Jul 8 2016

Many recent books cover a combination of Python, data science, statistics, and machine learning. They vary widely in prerequisites and approach. This book does not include data science in its title and does not use large data sets. Its examples are often coin tossing and use small sets of random values. It assumes background in Python, probability, and statistics. A mathematical undergraduate course in probability and statistics would be necessary. The main purpose of this book seems to be to show how Python libraries can be used to implement concepts in probability, statistics, and machine learning. Code consists mostly of library calls. Machine learning is of growing importance, but is treated here in the context of probability and statistics in the final chapter, using only trivial examples instead of large data sets. Thus, this would not be the book for someone especially interested in machine learning.

The first chapter briefly introduces Python libraries Numpy, Mathplotlib, Scipy, Pandas, and Sympy, but does not explain most of the functions used later. Anaconda is recommended for installing Python and it works well, including the needed libraries. One could also use the pip tool to install libraries in an existing Python system. One of the nice features of the book is that the entire text is available for download in the interactive Python format that allows the embedded code to be executed, and changed or augmented to experiment. Each chapter also has a few additional interactive code files not included in the text. Some of these notebook files, from the text or supplementary, have errors.

Chapter 2 covers probability. It can take some work to fill in the details of derivations. The author has taught courses based on this material; in the example showing that conditioning can make dependent random variables independent, his comments as he did the derivation might have made it easier to follow. An online course or a video on these topics would have been more effective in teaching readers with diverse backgrounds. The section on Monte Carlo sampling methods looked interesting, but the interactive Python code from the text generated an error that turned out to be 1/12 resulting in zero due to integer division. The supplementary code also generated an error because the subplots() function was not found. An import was needed.

The third chapter, on statistics, at almost 100 pages, is the longest. As motivation, it starts with three interesting, famous problems that are unfortunately never mentioned later in the chapter. The interactive code in the book for the section on estimation using maximum likelihood works well except for the code that generates Figure 3.3, which had a divide by zero warning. No code explanation is provided, but using Google to reference the subplots, linspace, and plot functions helped determine that the linspace boundaries caused the warning and changing them slightly removed it. Figure 3.3 has the curious feature of the legend symbols being repeated twice instead of the more reasonable once. A search produced a stackoverflow.com site indicating a default of two for numpoints in the legends function. Changing it to one made the fix. It was fun to fix these minor problems, but probably more reasonable for the author rather than the reader to do it.

The concluding chapter 4 introduces machine learning. The decision tree section has few prerequisites. The code in the book gets a DecisionTreeClassifier and calls the “fit” function. Figure 4.18 shows the tree with no explanation of how it was constructed, although Figure 4.19 shows a nice graph of how the tree works. Looking at the supplementary file shows some code to produce Figure 4.19, with a link to a stackoverflow.com site as a reference. This section is like a movie that captures your interest and then stops in the middle.

The book contains a lot of good material, but would have been much more useful with explanations of the Python library functions and brief overviews of the probability and statistics topics. Even with prior background, most readers will have a few gaps.

Reviewer:  Arthur Gittleman Review #: CR144561 (1609-0629)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Python (D.3.2 ... )
 
 
Statistical Computing (G.3 ... )
 
 
Learning (I.2.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Python": Date
Practical Python
Hetland M., APress, LP, 2002.  648, Type: Book (9781590590065)
Mar 28 2003
Python programming: an introduction to computer science
Zelle J., Franklin B, 2003. Type: Book (9781887902991)
Dec 2 2004
Foundations of Python network programming
Goerzen J., APress, LP, Berkeley, CA, 2004.  512, Type: Book (9781590593714)
Dec 26 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy