Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Mathematical tools for data mining (1st ed.): set theory, partial orders, combinatorics
Simovici D., Djeraba C., Springer Publishing Company, Incorporated, 2008. 628 pp. Type: Book (9781848002005)
Date Reviewed: Feb 6 2009

The dependence of data mining on mathematics, particularly on set theory, statistics, and linear algebra, is unquestionable. There are numerous examples that underline this reliance. For example, the task of frequent itemset mining is pertinent to set theory. Linear algebra has laid the grounds for dimensionality reduction techniques by mapping the data to an attribute space of lower dimensionality; this provides the means for scalable and more accurate data mining methodologies. Furthermore, probability theory--widely employed in statistics--provides the underlying principles on which popular classification and clustering algorithms are based. In fact, the use of statistical approaches to data mining has become a research field in its own right, with numerous publications.

This reference book provides an in-depth exploration of the most commonly adopted mathematical tools that support the data mining process. The data mining researcher’s knowledge of these tools is of paramount importance, as it enables him or her to understand the existing data mining literature and to propose novel mining approaches. In detail, Simovici and Djeraba present the set-theoretic foundations of data mining. The book is organized into four parts, with a total of 15 chapters. Each chapter sheds light on a specific area of mathematics that plays a significant role in data mining, and offers numerous exercises and references for further reading.

The first part of the book is devoted to set theory, relations, and functions that are defined in sets, elementary algebraic structures, and the basic theory of graph modeling. Chapter 1 introduces the reader to set theory and demonstrates the application of its concepts to relational databases. Chapter 2 discusses the use of algebraic structures on sets. After presenting the basic operations on sets, the authors proceed to discuss several algebra types, morphisms, congruences, and subalgebras. Next, the authors discuss the underlying properties of linear spaces and matrices. Chapter 3 elaborates on graph modeling by covering both directed and undirected graphs, as well as hypergraphs.

The second part of the book is devoted to algebraic structures that are closely related to partial orders. Chapter 4 introduces partially ordered sets. Chapter 5 presents the lattice structure that is of paramount importance for the discovery of frequent itemsets in transactional datasets. The authors highlight some interesting properties of lattices, such as the theory of complete lattices and the Galois connections. In addition, the authors discuss the use of Boolean algebra for the discovery of minimal sets of features in a dataset, through logical data analysis. An introduction to the point-set topology and the related measures is offered in chapter 6. Topologies are essential to data mining tasks that involve searching algorithms that take into consideration the local properties of the dataset at hand. Chapters 7 and 8 elaborate on some data mining applications of the concepts presented. Among them, the commonly used tasks of frequent itemset mining and association rules mining are presented in detail. Finally, chapter 9 explains rough sets, as well as the theory behind decision trees, which are commonly adopted for classification.

The third part of the book focuses on metric spaces and their application to clustering, classification, and data preprocessing. Chapter 10 introduces dissimilarity functions and their specializations, including metrics, tree metrics, and ultrametrics. The presented metrics are discussed in different data types, such as n-dimensional vectors, subsets or partitions of finite sets, and sequences. Chapter 11 studies the topological properties of metric spaces, while chapter 12 introduces the dimension theory of metric spaces and the curse of dimensionality. Finally, chapter 13 is dedicated to clustering approaches. The authors discuss some basic types of clustering algorithms, their limitations, and some evaluation techniques for the measurement of the quality of clustering.

The fourth part of the book is devoted to combinatorics. Chapter 14 presents a set of combinatorial techniques that are essential for data mining. The authors discuss the inclusion-exclusion principle, Ramsey’s theorem, combinatorics of partitions, and counting issues related to collections of sets. Next, chapter 15 presents the Vapnik-Chervonenkis dimension of a collection of sets that is used in machine learning approaches.

Overall, Simovici and Djeraba’s presentation of both the theoretical grounds and the practical aspects of the various data mining methodologies is good. The writing style is easy to follow and the ideas communicated are concisely presented. The book is intended for readers who have a data mining background and are familiar with basic calculus. It will help this audience to improve their knowledge of how different data mining strategies operate from a mathematical standpoint. On the negative side, the coverage of data mining applications is underemphasized, and several data mining approaches that are based on mathematical tools are missing.

Reviewer:  Aris Gkoulalas-Divanis Review #: CR136494 (0912-1128)
Bookmark and Share
 
Set Theory (F.4.1 ... )
 
 
Data Mining (H.2.8 ... )
 
 
Deduction (I.2.3 ... )
 
 
Combinatorics (G.2.1 )
 
 
Graph Theory (G.2.2 )
 
 
Knowledge Representation Formalisms And Methods (I.2.4 )
 
  more  
Would you recommend this review?
yes
no
Other reviews under "Set Theory": Date
Set theory for computing: from decision procedures to declarative programming with sets
Cantone D., Omodeo E., Policriti A., Springer-Verlag New York, Inc., New York, NY, 2001.  409, Type: Book (9780387951973)
May 15 2002
Incomplete information: structure, inference, complexity
Demri S., Orlowska E., Orlowska E., Springer-Verlag New York, Inc., Secaucus, NJ, 2002.  450, Type: Book (9783540419044)
Jan 8 2003
Predicate abstraction of ANSI-C programs using SAT
Clarke E., Kroening D., Sharygina N., Yorav K. Formal Methods in System Design 25(2-3): 105-127, 2004. Type: Article
Apr 7 2005
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy