The data mining problem is how to extract new, useful, understandable patterns from a large dataset. Rough set theory provides a good framework for datasets, which are large, noisy, and dynamic with few external theoretical underpinnings. Therefore, the problem is one of selecting the features of the data that give rise to the desired classification (such as the “factors” leading to a “successful” as opposed to “unsuccessful” medical operation), followed by iterative removal of the features to identify “rules” as accurately as possible in the noisy environment. Four algorithms are developed and analyzed in terms of their time complexity and accuracy. They are considered in relation to medical examples.
This research paper is hard to follow, with words such as “features” being used in their natural and technical senses simultaneously. The layout is unhelpfully compact, and some choices of notation are ugly. Despite this and an approach that is more theoretical than necessary for the results, the paper produces good guidelines on how to proceed in real situations and has significant potential applicability. More worked examples would help. Their absence may be partially explained by the fact that no example comes from the domains of the funders of the work (the US Department of Energy and Army Research Office). An excellent set of references is given. I hope subsequent papers will expand the set of examples and more clearly indicate the domains to which the work is applicable.