Data analysis by Graham Upton and Dan Brawn is a concise and to-the-point guide to data analysis. Compared to other statistics and data analysis books, this one is well written without delving into extensive details on each topic, making it accessible to college-level students who want to grasp the basics in a short time. For newcomers to data analysis and its foundational concepts, this book serves as an excellent starting point. The authors present the material in a straightforward and clear manner, allowing readers to comprehend the essentials in a single sitting. The book focuses solely on the main points necessary for a deeper exploration of data analysis, making it suitable for beginners rather than those seeking to deepen an existing understanding.
The authors begin by explaining basic terms such as variables, population, sample, and discrete and continuous data. The second section covers probabilities and probability distributions, including estimation and confidence intervals. The book then moves on to discuss models, p-values, and hypothesis testing. Finally, it introduces classification techniques such as naive Bayes, k-nearest neighbors (KNNs), and support vector machines (SVMs). All these concepts provide a solid foundation for a first-year college course on data analysis. The book is ideal for beginners as a preliminary step toward a more in-depth understanding of data analysis.
However, Data analysis does not cover inferential statistics and their role in data analysis. Inferential statistics are used to calculate a p-value, which is the probability of obtaining the observed data by chance. This allows investigators to compare the p-value against a prespecified level of significance, playing a crucial role in data analysis. Additionally, when presenting confidence intervals, it would have been helpful if the authors had included instructions on calculating z-values and explained their relationship to probability. Standardizing the values (raw scores) of a normal distribution by converting them into z-scores can estimate the probability of a particular data point occurring within that distribution.
While the book covers the most important aspects of data analysis in a concise and accessible way, it lacks practice problems or solved exercises, which are commonly found in other textbooks. Moreover, only a few references are provided at the end of each chapter. Also missing are instructions on using tools like Excel, SAS, or the R language for data analysis, which could help readers learn to analyze their own data. Therefore, it might not be suitable as a textbook for a comprehensive data analysis course.