Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Multi-relational pattern mining over data streams
Silva A., Antunes C. Data Mining and Knowledge Discovery29 (6):1783-1814,2015.Type:Article
Date Reviewed: Apr 26 2016

Mining large amounts of data in real time has been a great challenge. This paper deals with an important theme in this area and gives an algorithm for mining frequent relational patterns over data streams, being represented by batches of star-schema records. The treatment is deterministic.

The algorithm creates a compact data structure called a pattern tree. This structure stores in an efficient way all of the data out-ranging a given threshold by its prevalence. It suffices that all item sets whose true frequency exceeds the threshold are reported (there are no false negatives). Frequent item sets are stored in a sophisticated tree structure: local- and global-level trees, applying both explicit references and hashing.

A few algorithms exist in the literature for similar aims. The authors qualitatively compare their comprehensive approach to others, some of which are probabilistic.

Throughout the paper, the ideas, approach, and solutions are explained by a simple example. Details of the complex data structures are illustrated by examples. The essence of the two-pass algorithm is given in pseudocode, which is discussed.

The algorithm has been implemented and run on two databases: a demo sales one with about 60,000 facts (Adventure-Works 2008 Data Warehouse, created by Microsoft) and Hepatitis, which is ten times larger than the former, and is a real dataset in the healthcare domain. The results are discussed with respect to accuracy, size of the pattern tree, execution time, and memory consumption.

The authors deal with simple dimensions without aggregations. An interesting refinement of the data structure and the algorithm would be to generalize and refine them to handle correlated dimensions for the production of non-redundant skeletons as the bases of snowflake schemas. Another direction for making the algorithm suitable in broader application areas would be to integrate it into the dimension conversion phase of filling a data warehouse. For data marts depending on and loaded from the warehouse, multistar representation of this data could be retrieved in a straightforward way.

Reviewer:  K. Balogh Review #: CR144354 (1607-0531)
Bookmark and Share
 
Data Mining (H.2.8 ... )
 
 
Information Storage And Retrieval (H.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy