Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Frequent pattern mining
Aggarwal C., Koutra D., Springer Publishing Company, Incorporated, New York, NY, 2014. 471 pp. Type: Book (978-3-319078-20-5)
Date Reviewed: Apr 6 2015

Data mining is all about collecting useful information and knowledge from volumes of data, which can be in the form of various numbers and facts. Companies that serve thousands of consumers have always needed data mining tools and there has been much interest recently from academia; thus, the domain has seen some rapid improvement.

Though the frequent pattern mining model developed later than the other data mining formulations, it has taken a central place in the area. There have been many publications on this topic in general, but still there is a need for a book that exhaustively covers all aspects of the subject. The editors here have started with an ambitious goal to meet those requirements, and they mostly accomplish it. The presented material assumes some previous exposure to the problems involved, so students and practitioners can use it with some training.

This book is organized into 18 chapters. Chapter 1 starts with a descriptive introduction to the domain with its four categories: technique-centered, scalability issues, advanced data types, and applications. Chapters 2 through 8 cover techniques, and chapters 9 and 10 cover scalability. Advanced data types are discussed in chapters 11 through 14, and privacy issues are covered in chapter 15. Details on applications are in chapters 16 through 18.

Chapter 2 provides some introductory material and surveys join-based algorithms like Apriori (a level-wise mining method) and lexicographic or enumeration tree-based algorithms. Toward the end, condensed representation of itemsets is described.

A new method that adopts divide-and-conquer to grow patterns, while avoiding an expensive step of candidate generation, is the subject of chapter 3. Using a pattern lattice model, where each node represents a pattern, chapter 4 groups mining algorithms into three categories: pattern enumeration, pattern merging, and pattern traversal. These paradigms are studied for mining long patterns. Enumeration methods evaluate all candidates of the pattern lattice in a breadth-first or depth-first order. Merge methods reach long patterns by taking leaps. Traversal methods identify a long pattern and explore adjacent patterns.

Patterns describe only part of the data, and pattern mining methods tend to find many patterns; therefore, pruning unnecessary items becomes important. Frequency alone is not sufficient as a factor, so chapter 5 discusses various measures of “interestingness” along with algorithms for extracting such patterns. These techniques are based on heuristics and meant only for binary data. In chapter 8, a similar goal is considered by asking for the set of patterns that is optimal with regard to a global “interestingness” criterion. The compression problem is formalized using the minimum description length (MDL) principle.

The relationship between the absence of an item and the presence of another is represented using negative association rules. Discovering such rules is more complex and computationally expensive. Chapter 6 reviews the research in this area, which so far has been relatively limited. General systems that provide easy-to-use interfaces for specifying the constraints to be used during the search are the topic of chapter 7. Special-purpose languages are used in graph/sequence mining systems, but structured query language (SQL)-based approaches are possible for itemset patterns and association rules.

In streaming environments, mining algorithms are allowed only a single pass over the data, thus posing additional challenges. Patterns to be considered are exponential, memory requirements are substantial, and there is a need to balance accuracy versus efficiency. A survey of algorithms for finding frequent patterns in data streams is presented in chapter 9. The basic principle of lossy counting is the core approach. Another aspect of scalability, related to big data, is the topic of chapter 10.

Mining customer purchase patterns and web access patterns involves discovering frequent subsequences as patterns in a sequence database. The challenge here is the explosive number of intermediate subsequences. Chapter 11 presents an overview of the algorithms, and categorizes them into Apriori and pattern growth-based approaches. The pattern growth paradigm avoids the pruning steps and narrows the search space by mining smaller projected databases separately.

Smart transportation, urban planning, and social network analysis require pattern mining from data that is related to spatial and temporal information. Such patterns can be classified into individual periodic patterns, pairwise movement patterns, and aggregate patterns over multiple trajectories. Chapter 12 reviews the state of the art of the methods used for such problems. Chapter 13 examines frequent subgraph mining algorithms and introduces studies on significant, representative, and dense subgraph patterns. New settings such as graph streams are also discussed. Chapter 14 covers models and algorithms applicable to real-life situations with uncertain data.

Since each chapter is designed as a survey, and is contributed by different experts, this book can also be seen as a collection of advanced papers in the field. The editors have grouped related chapters together, though the connection and continuity that is typically observed in standard textbooks will be hard to find here. One common strength observed in every chapter is the comprehensive list of references provided, a great asset that will aid researchers in further investigations of the open problems suggested here.

Reviewer:  Paparao Kavalipati Review #: CR143309 (1507-0553)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Data Mining (H.2.8 ... )
 
 
General (I.5.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy