Computing Reviews, the leading online review service for computing literature.

Search

Clustering
Xu R., Wunsch D., Wiley-IEEE Press, Hoboken, NJ, 2009. 358 pp. Type: Book (9780470276808)

Date Reviewed: Jun 5 2009

Clustering algorithms are important to a wide spectrum of scientific disciplines, spanning from computer science (CS) and engineering to medical, earth, and social sciences. Applications of clustering are numerous: speech recognition, organization of document collections, disease diagnosis and treatment, star and planet classification, analysis of social networks, and criminal physiology, to name just a few. As a result, it is not surprising that more than 12,000 scientific papers related to clustering have been published since 1996. This book provides a comprehensive and thorough presentation of this research area, describing some of the most important clustering algorithms proposed in research literature. The book is organized into 11 chapters that highlight the various aspects of the clustering process. Chapter 1 is a brief introduction that discusses cluster analysis in general, defines the notion of clusters, and presents some interesting clustering applications. In the second chapter, Xu and Wunsch shed light on the different proximity measures that have been established to quantify the similarity between data records. The definition of similarity between two data records is one of the most important factors in the clustering process, as it provides the basis for the identification of high-quality clusters. After discussing the basic properties that a proximity measure must satisfy, the authors present a collection of measures that are suitable for continuous, discrete, and mixed variables. Chapters 3 to 9 are dedicated to specific clustering algorithms, technologies, and theories that have been proposed to facilitate clustering in different data domains and application environments. The last section of these chapters is devoted to the presentation of real-world applications, where the corresponding approaches are commonly adopted. In particular, chapter 3 collects clustering algorithms that organize the data records into a hierarchical structure; each level of this structure corresponds to a clustering solution of a different number of clusters. The clustering hierarchy can be built either in a bottom-up (agglomerative algorithm) or in a top-down (divisive algorithm) fashion. After presenting the classical hierarchical clustering schemes, the authors concentrate on a set of recent hierarchical approaches that are more robust to noise and outliers. Chapter 4 presents a set of partitional clustering solutions, where the data records are directly partitioned into a prespecified number of clusters. Xu and Wunsch present in detail the popular k-means algorithm and its advancements, as well as some graph theory, fuzzy, and search technique clustering methodologies. The use of neural networks in clustering is highlighted in chapter 5, where the authors discuss existing clustering approaches that are suitable for either hard or soft competitive learning. Chapter 6 is dedicated to kernel-based clustering solutions that map a set of nonlinearly separable patterns into a higher dimensional feature space, where they are linearly separable. After presenting the theory behind kernel-based clustering approaches, Xu and Wunsch discuss nonlinear principal component analysis, squared error-based clustering, and support vector kernel-based clustering. Chapters 7 to 9 are devoted to more recent applications involving the clustering of sequential, large-scale, or high-dimensional data. Specifically, chapter 7 focuses on the clustering of sequential data, commonly met in medical sciences. In this chapter, the authors present formulas to quantify sequence similarity, as well as three clustering algorithms that are suitable for sequential data. Following this, chapter 8 deals with the clustering of large-scale data, where the scalability of the clustering algorithm is a top priority. The existing methodologies are divided into six categories: random sampling, data condensation, density-based, grid-based, divide and conquer, and incremental learning. Then, in chapter 9, the authors present a set of methods for the clustering of high-dimensional data. As part of this chapter, both linear and nonlinear projection algorithms are investigated, along with projected and subspace clustering approaches. The role of data visualization is also emphasized. Chapter 10 presents metrics for the validation of the clustering results. The authors divide the existing metrics into three categories: external indices, internal indices, and relative indices. Finally, the last chapter of the book summarizes research challenges and presents trends in the area. The book targets researchers and graduate students in the clustering field. However, the book is easy to follow even by nonexperts, as it does not require significant background knowledge. On the positive side, the book covers a wide spectrum of real-world applications and provides rich references for further reading. On the negative side, although the book presents the workings of the algorithms with a reasonable degree of detail, it provides no specific examples of their operation. Furthermore, in some clustering algorithms, the authors do not discuss their bias to aspects such as the shape of the identified clusters and their robustness to outliers.

Reviewer: Aris Gkoulalas-Divanis	Review #: CR136915 (1004-0363)

Clustering (I.5.3 )

General (F.2.0 )

Would you recommend this review?

yes

Other reviews under "Clustering":	Date

On the convergence of “A self-supervised vowel recognition system” Pathak A., Pal S. Pattern Recognition 20(2): 237-244, 1987. Type: Article	Aug 1 1988

Conceptual clustering of structured objects: a goal-oriented approach Stepp R., Michalski R. (ed) Artificial Intelligence 28(1): 43-69, 1986. Type: Article	Sep 1 1986

The enhanced LBG algorithm Patané G., Russo M. Neural Networks 14(9): 1219-1237, 2001. Type: Article	Apr 2 2003

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy