When onions and tomatoes are in a basket, it is likely that lettuce will be found also. Data mining can discover a large number of association rules like this one. Through research in association rule mining, a variety of measures have been defined to determine how interesting a pattern is so that only strong patterns with high degrees of interestingness are identified.
This paper analyzes 61 known measures and uses 110 different datasets on each to provide a categorization. The number of attributes in these datasets (coming from the University of California at Irvine, the gene expression medical data repositories, and from multi-class protein folding) varies between four and 1559, and the attribute-value pairs between eight and 3121.
The contribution of this research is in the juxtaposition of theoretical definitions with empirical behaviors of the measures. Several past discrepancies are found, and equivalences between some existing measures have been determined, reducing their total number to 21. The consequences of this work are significant. Instead of using computationally complex measures, similar alternatives can be chosen. But eventually it is the knowledge domain that remains the single deal breaker in choosing an approach, thus making the research on these association rules relevant mostly to theoreticians.
The paper is well written, with a comprehensive study of other works, sound theory, and support from visual presentations in discussing the findings. It is these visual presentations that are easily remembered and can be used as quick references whenever rule mining alternatives might be considered.