Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Accurate prediction of protein disordered regions by mining protein structure data
Cheng J., Sweredoski M., Baldi P. Data Mining and Knowledge Discovery11 (3):213-222,2005.Type:Article
Date Reviewed: Oct 11 2006

Disordered regions in proteins have an essential role, at least with respect to their biological properties and behavior, and methods of investigation. Therefore, it is important to determine and predict the locations of these regions. The authors report on their method, program, and results for predicting intrinsically disordered regions in proteins.

The method is based on the structural protein data bank (PDB) available at the University of California. This project is not the first the authors have completed: the program DISpro is one of the protein data mining tools developed at the Institute for Genomics and Bioinformatics at the University of California, Irvine. These tools are available for non-profit applications through the Internet. Tutorials on bioinformatic themes are also provided. Some of these can be of help to newcomers wanting to understand this paper.

The authors have developed a sophisticated method, based on recursive neural networks, to take into account long-range contextual information for determining a fixed number of weights during the learning process. Starting with the structural properties, predicted secondary structure class, and predicted relative solvent accessibility of nonredundant protein chains selected from the PDB, the network was trained and tested by ten-fold cross-validation. The resulting network was tested on CASP5, containing essentially different proteins from PDB. The precision of the prediction power of DISpro overtakes those of the other predictors tested on CASP5.

Finally, the paper lists some ideas about how to refine the method, by taking into account short and long disordered regions separately (proven by the authors, using DISpro, to behave differently), and also presents predictions for homolog proteins. Further variations of the method could be incorporated into these for protein tertiary structure prediction. The method might be used to cross-relate different types of protein databases: structural, pathway, and protein interaction.

Reviewers:  K. BaloghZsofia Balogh Review #: CR133422 (0708-0813)
Bookmark and Share
 
Data Mining (H.2.8 ... )
 
 
Biology And Genetics (J.3 ... )
 
 
Pattern Analysis (I.5.2 ... )
 
 
Pattern Matching (F.2.2 ... )
 
 
Structural (I.5.1 ... )
 
 
Design Methodology (I.5.2 )
 
  more  
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy