Computing Reviews, the leading online review service for computing literature.

Search

Multi-relational pattern mining over data streams
Silva A., Antunes C. Data Mining and Knowledge Discovery29 (6):1783-1814,2015.Type:Article

Date Reviewed: Apr 26 2016

Mining large amounts of data in real time has been a great challenge. This paper deals with an important theme in this area and gives an algorithm for mining frequent relational patterns over data streams, being represented by batches of star-schema records. The treatment is deterministic. The algorithm creates a compact data structure called a pattern tree. This structure stores in an efficient way all of the data out-ranging a given threshold by its prevalence. It suffices that all item sets whose true frequency exceeds the threshold are reported (there are no false negatives). Frequent item sets are stored in a sophisticated tree structure: local- and global-level trees, applying both explicit references and hashing. A few algorithms exist in the literature for similar aims. The authors qualitatively compare their comprehensive approach to others, some of which are probabilistic. Throughout the paper, the ideas, approach, and solutions are explained by a simple example. Details of the complex data structures are illustrated by examples. The essence of the two-pass algorithm is given in pseudocode, which is discussed. The algorithm has been implemented and run on two databases: a demo sales one with about 60,000 facts (Adventure-Works 2008 Data Warehouse, created by Microsoft) and Hepatitis, which is ten times larger than the former, and is a real dataset in the healthcare domain. The results are discussed with respect to accuracy, size of the pattern tree, execution time, and memory consumption. The authors deal with simple dimensions without aggregations. An interesting refinement of the data structure and the algorithm would be to generalize and refine them to handle correlated dimensions for the production of non-redundant skeletons as the bases of snowflake schemas. Another direction for making the algorithm suitable in broader application areas would be to integrate it into the dimension conversion phase of filling a data warehouse. For data marts depending on and loaded from the warehouse, multistar representation of this data could be retrieved in a straightforward way.

Reviewer: K. Balogh	Review #: CR144354 (1607-0531)

Data Mining (H.2.8 ... )

Information Storage And Retrieval (H.3 )

Would you recommend this review?

yes

Other reviews under "Data Mining":	Date

Feature selection and effective classifiers Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article	May 1 1999

Rule induction with extension matrices Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article	Jul 1 1998

Predictive data mining Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)	Feb 1 1999

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy