Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Multiclass cancer classification using semisupervised ellipsoid ARTMAP and particle swarm optimization with gene expression data
Xu R., Anagnostopoulos G., Wunsch D.  IEEE/ACM Transactions on Computational Biology and Bioinformatics 4 (1): 65-77, 2007. Type: Article
Date Reviewed: Sep 5 2007

An important objective in clinical oncology is to rapidly determine the type of tumor a patient has, and to provide a prediction of tumor behavior in order to identify the most appropriate treatment. Traditionally, this has been done by pathologists, who look at the type of cells and other morphological features present in the tumor and surrounding tissues. The availability of microarray technology for measuring the RNA expression levels for every gene in the genome has opened the door to characterizing the biology of tumors in a completely different way. Previous studies have demonstrated the potential for the use of genome-wide expression analysis, to yield important information about tumor biology that complements the traditional pathological features.

The availability of genome-wide expression data has created a tremendous demand for data mining and machine learning methods that are capable of producing a subset of important features (namely, genes), along with a mathematical model relating variability in those features to variability in tumor class. Xu et al. present a hybrid machine learning approach for identifying gene expression features that are associated with a polytomous (more than binary) tumor endpoint. The authors use an adaptive resonance theory (ART) neural network for classification, with a particle swarm optimizer (PSO) for feature selection. The authors apply this approach to three real datasets, and compare the performance to other methods that have been applied to the data. They were able to show that this approach is competitive with other approaches, such as a probabilistic neural network. The combination of the ART-based neural network with PSO is novel.

An important consideration when evaluating and comparing data mining and machine learning methods is whether the methods that seem to be performing the best are actually finding the true signal in the noise. This is very difficult to assess when the benchmarking is performed on real datasets, since the truth is not knowable. The alternative is to compare methods using simulated data (where the signal is engineered into a noisy dataset). The challenge with simulated or artificial data is that the engineering of a realistic pattern in the data may be difficult, and there are typically many assumptions made that might not be valid. However, simulated data seems like a good starting point for any new or novel method. It will provide an important baseline for performance, prior to the analysis of real data. Once a new method is applied to real data, and compared to other methods, the gold standard should be the biological interpretation (rather than the classification accuracy, for example). A method that produces a good classifier with a biologically meaningful model will be more valuable to a clinical oncologist than a good classifier that can’t be interpreted. These are all important things to keep in mind when developing and evaluating new classification methods for high-dimensional biological datasets.

Reviewer:  Jason Moore Review #: CR134698 (0808-0810)
Bookmark and Share
  Editor Recommended
Featured Reviewer
Classifier Design And Evaluation (I.5.2 ... )
Biology And Genetics (J.3 ... )
Applications (I.5.4 )
Models (I.5.1 )
Would you recommend this review?
Other reviews under "Classifier Design And Evaluation": Date
Image texture analysis: foundations, models and algorithms
Hung C., Song E., Lan Y.,  Springer International Publishing, New York, NY, 2019. 258 pp. Type: Book (978-3-030137-72-4)
Feb 18 2021
"Why should I trust you?": Explaining the predictions of any classifier
Ribeiro M., Singh S., Guestrin C.  KDD 2016 (Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA,  Aug 13-17, 2016) 1135-1144, 2016. Type: Proceedings
May 29 2020
Learning representation for multi-view data analysis: models and applications
Ding Z., Zhao H., Fu Y.,  Springer International Publishing, New York, NY, 2019. 268 pp. Type: Book (978-3-030007-33-1)
May 7 2019

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2022 ThinkLoud, Inc.
Terms of Use
| Privacy Policy