The collection of books on data science is becoming so large that it is itself an interesting subject for data science analysis [1,2]. This is amply justified by the galloping success of data science among students, researchers, and practitioners in both academia and industry. In this large readership, the supply of books is very diverse. Behrman’s Foundational Python for data science fits right in with this collection, though of course with its own distinctive features.
Data science is a relatively new field, strongly interdisciplinary, which is at the intersection of math, statistics, and computer science , but also requires economic and legal skills, among others. The increasingly available data coming from several sources (social networks, Internet of Things devices, user interactions, and so on) accompanied with affordable computational resources has attracted the interest of many companies that want to extract value from data. Managing such data, often characterized by large volume, extreme variety, or high production velocity, requires a new profession, that is, the data scientist, which Harvard Business Review called “the sexiest job of the 21st century” . In turn, the potential for a career in data science attracts people with heterogeneous competences, who need to be aligned on some basic skills like programming, especially in Python, which is the elective language for data science.
Many books on Python programming focus on the use of some libraries specifically suited for data science tasks, while giving little space to the fundamentals of the Python language. Other books, like Behrman’s, focus on the language (and few libraries devoted to data science), leaving the detailed study of libraries to other books. (There is also a third category of book: one that covers both the language and its libraries in depth.) Books like Behrman’s are especially suited to study programs, for example, undergraduate programs in data science, where the study of programming is separate from the study of other topics that require specialized libraries (such as machine learning, statistics, natural language processing, and so on). Books of this category can be very succinct  or quite large ; Behrman’s is quite slim and may suit a short crash course on Python programming.
Learning programming for data science is different from learning programming for computer science. While the latter puts great emphasis on problem solving, abstraction, programming in the large, programming paradigms, and so on, in data science the learning approach is usually more small-scale, mainly based on scripting, and more pragmatic. (A computer scientist knows how to implement a method; a data scientist knows how to use the implementation for seeking a goal.) With this distinction in mind, Behrman’s book unfolds the basic concepts of Python programming in a classical fashion, but puts greater emphasis on the native Python data structures, which are presented before execution control and functional abstraction. (Books that follow the standard approach adopted in CS courses are usually organized around a different order of topics; see, for example, Deitel and Deitel .) Each chapter ends with a bunch of questions, with corresponding answers in a separate appendix. In my opinion, the provided questions are too few and not enough to learn a language that, like any programming language, requires a lot of practice.
Furthermore, some of the author’s personal choices may not find general agreement. For example, Behrman discourages the use of lambda functions (despite Python’s orientation to the functional paradigm that is so helpful in data science programming), while nothing is said about the readability issues of the reduce() function (in fact, because of readability issues, Python developers have confined it to a specialized module). Furthermore, some key Python features like lazy evaluation are overlooked (with just a trace when generators are introduced); in fact, lazy evaluation motivates the use of map() and filter() functions, which otherwise remain inexplicably indistinguishable from comprehensions. Finally, errors in some listings could confuse the novice reader.
Overall, this book is best used in a short course on programming fundamentals for data science, with tight supervision from a teacher who can help students by integrating materials and providing many exercises. For self-study, insights into Python, and self-paced practice, other resources may be recommended.
More reviews about this item: Amazon