The class imbalance problem refers to the fact that, in real-world data, there is often a majority class of examples and a minority class of examples. Such a dataset is called an imbalanced dataset. A classifier induced from such a set has high classification accuracy for majority examples, but also a high error rate for minority examples.
Solving a typical imbalance problem is done by either an algorithm/model-oriented approach or by data manipulation techniques. This paper discusses a novel approach to tackle the imbalanced data problem. The proposed method, Mahalanobis distance-based sampling (MDS), is very technical and is clearly explained in the paper.
Chen, Hsu, and Chang compare their method with existing ones, and conclude that their method can drastically improve the classification ability for imbalanced data. However, certain details should be further investigated and should motivate future research.