Disordered regions in proteins have an essential role, at least with respect to their biological properties and behavior, and methods of investigation. Therefore, it is important to determine and predict the locations of these regions. The authors report on their method, program, and results for predicting intrinsically disordered regions in proteins.
The method is based on the structural protein data bank (PDB) available at the University of California. This project is not the first the authors have completed: the program DISpro is one of the protein data mining tools developed at the Institute for Genomics and Bioinformatics at the University of California, Irvine. These tools are available for non-profit applications through the Internet. Tutorials on bioinformatic themes are also provided. Some of these can be of help to newcomers wanting to understand this paper.
The authors have developed a sophisticated method, based on recursive neural networks, to take into account long-range contextual information for determining a fixed number of weights during the learning process. Starting with the structural properties, predicted secondary structure class, and predicted relative solvent accessibility of nonredundant protein chains selected from the PDB, the network was trained and tested by ten-fold cross-validation. The resulting network was tested on CASP5, containing essentially different proteins from PDB. The precision of the prediction power of DISpro overtakes those of the other predictors tested on CASP5.
Finally, the paper lists some ideas about how to refine the method, by taking into account short and long disordered regions separately (proven by the authors, using DISpro, to behave differently), and also presents predictions for homolog proteins. Further variations of the method could be incorporated into these for protein tertiary structure prediction. The method might be used to cross-relate different types of protein databases: structural, pathway, and protein interaction.