- Computational Approaches to Side-Chain Prediction in Proteins with
Known Backbone Structures
- H. Farid
- M.S. Thesis, Department of Computer Science, State University of
New York at Albany, 1992
- thesis not available on-line
Two algorithms are presented for prediction of side chain configurations in
proteins with known backbone structures. The first algorithm predicts side
chain configurations in small buried cores and the second predicts those
configurations in entire proteins. The principle difficulty facing side
chain prediction algorithms is the combinatorial explosion of
simultaneously predicting even a small number (more than ten) of residues.
One strategy to overcome this combinatorial explosion is to greatly reduce
the number of possible configurations of each residue type to only a few
rotamer positions; this methodology is adopted in the work presented here.
In order to predict the side chain configurations in protein cores, the
number of unfavorable van der Waals contacts and the rotamer probability
are computed for every possible rotamer configuration. It is then shown
that one of the few configurations (less than 99.9999% of all
configurations) with a small number of unfavorable van der Waals contacts
and high rotamer probability is the "best" rotamer model of the known
native structure. For the eight cores studied, the "best" rotamer models
of the known native structure have an average rms deviation of 0.477
Angstrom from their native structures. For entire proteins, it is
computationally intractable to exhaustively search all possible rotamer
configurations. Simulated annealing and Monte Carlo statistical sampling
techniques are employed to overcome the combinatorial explosion of
searching every rotamer configuration of an entire protein. The prediction
algorithm "anneals" to a minimum "energy state", then a Monte Carlo
sampling of 10,000 configurations with energies within one standard
deviation of this minimum energy is taken. In order to predict side chain
configurations, an entropy analysis of the Monte Carlo configurations
sampled is performed. The entropy analysis assigns to each residue a
measure of confidence in predictability; if this measure is above a certain
threshold a prediction is made, otherwise no attempt is made to predict the
side chain configuration of the residue. For the fifty proteins of known
structure studied, on average, 62.8% of the side chain configurations are
predicted with an accuracy of 71.7%
|