Deoxyribonucleic acid (DNA) microarray data typically consists of a two-dimensional array, in which each entry indicates the expression level of a given gene in a given experiment. Clustering methods are used to find sets of genes, or experiments with similar patterns of expression. In supervised classification, each experiment corresponds to a clinical sample, and a clustering of the samples according to phonotype is given. Feature selection is important in both clustering and supervised classification. The problem is to determine a rule or set of rules that distinguishes the different clusters based on their gene expression patterns, and that can be used to classify other clinical examples.
In this paper, the authors contribute a solution to this problem. They argue that perceptrons are a valuable tool for the accurate classification of microarray data, but that the large input layers necessary for their application, and the low number of samples available for the training process, hamper their use. They propose to remedy this deficiency by balancing the number of samples for training, and the size of the input layer, through the use of their self organizing tree (clustering) algorithm (SOTA), and the use of a genetic algorithm for gene selection.
This paper will be of more interest to a bioinformaticist than to a computer scientist.