When analyzing data, looking at an item in isolation reveals very little of interest. It is only when the item is viewed in context and in relation to other items that patterns emerge and understanding is possible. This principle is at the heart of data analysis and represents the foundation of this book. Data becomes interesting when it is viewed as a whole. Properties that are invisible on an individual basis become apparent when considering the global context.
This brief eight-chapter book seeks to provide the reader with the tools to perform analysis of high-dimensional datasets and spaces. Using the lens of cluster analysis, the author gives the reader the grounding needed to perform high-dimensional cluster analysis using three key concepts: skeletons, empty spaces, and rankings. Each of these is described in turn before they are considered together to provide a framework for extracting understanding from high-dimensional spaces.
The book follows a very gentle trajectory. Beginning with basic concepts, a foundation is built that includes important analysis tools and principles that are later used to derive understanding from high-dimensional spaces. This gentle approach makes the book accessible to those unfamiliar with the field of data analysis. However, it should be noted that the lack of formal rigor might make it less interesting for the more advanced reader. There are very few mathematical considerations and descriptions; this benefits novice readers, but could be a point of frustration for those familiar with the field.
Two of the central concepts, the skeleton and the empty space, provide the basis for the applications described in later chapters. Both abstractions enable downstream analysis. Simply put, the skeleton provides a way to describe the relationships between clusters, and the empty space defines interesting regions within the space. The author uses these simple and intuitive concepts to analyze high-dimensional spaces in a variety of settings (as outlined in chapter 7).
The main strength of this book could also be its weakness. Throughout the book, the author takes a hands-off, descriptive approach. So, for example, when discussing clustering algorithms, various possibilities are briefly outlined and the strengths and weaknesses of each are touched upon. This leaves it up to the reader to decide which algorithm to employ. However, such an approach can have its disadvantages. If, as is apparently the case, the book is aimed at an audience unfamiliar with clustering, then such a nonprescriptive approach could be problematic. If you choose an inappropriate clustering algorithm, you will start with an inappropriate or “bad” clustering, and all subsequent analysis will be tainted. To combat this, the author suggests taking a multiple-go strategy, where multiple clusterings are compared in some manner and the most appropriate is chosen. This is where the penultimate chapter is key, as it encourages the use of context to help improve understanding.
Overall, this short book is a good introduction to the area of cluster analysis of high-dimensional data. It should be noted that it uses a specific set of tools developed by the author, namely the skeleton and empty space. However, there is enough room left in the description of each step to allow the user to control key aspects, thus making it useful in a wide variety of settings. As with individual data items, books such as these are most useful when viewed in the context of and in relation to other works. In that light, this book is a useful addition to the existing literature on cluster analysis in high-dimensional spaces by providing a starting point for those wanting an initial grounding in the area. Due to its brevity, however, I recommend reading more in-depth books alongside it for those wanting a more detailed understanding of this area.