Case-based methods have previously been used to assist in treating cancers, and this paper introduces a number of significant modifications in the case of breast cancer. Specifically, these modifications relate to the way in which the distance between cases is measured. The authors introduce what they call a weighted heterogeneous value distance metric. First, they note that the attributes used to retrieve cases can be of three types: numerical values that lend themselves to a Euclidean distance metric; values that indicate that something is present or absent where an overlap metric is appropriate; and, finally, more qualitative discrete values that may take values in a set--for example, “highly smooth, moderate smooth, and not smooth at all”--where the steps between values differ. For these values, the authors use a value difference metric that has a probabilistic flavor. The comparison metric uses a weighted version of these attribute differences.
The major contribution of the paper lies in the way in which the weights are determined. The authors use a genetic algorithm to learn the weights. The algorithm is quite standard, with fitness measured by prediction accuracy. It is worth noting that in the case of breast cancer, pathological evidence provides definitive answers. The system has been applied to two studies: one was designed to help oncologists reduce unnecessary biopsies, and the other was used to predict whether a breast cancer patient has secondary cancers. In both cases, the system outperformed other systems.
The paper is clearly written and can be recommended to people interested in case-based reasoning independently of any interest in the specific application described.