This paper introduces clustering and visualization methods for mining gene expression microarray data. The many difficulties caused by the very high dimensionality of microarray data are ameliorated, partly by allowing the user to select initial centers of data clusters, and partly by replacing principle component analysis (PCA) with new methods for two-dimensional (2D) visualizations of high dimensional clustering.
The first improvement of PCA is discriminatory component analysis, which uses class information for clustering and visualization. The second improvement is the PCA-projection pursuit method, which uses another unsupervised approach, projection pursuit method (PPM), to find interesting projections for visualization. The third improvement is independent component analysis-PPM, which accounts for the unavoidable statistical dependencies in microarray data. The advantages of these new clustering and visualization methods are illustrated by three well-chosen examples.