Computational Approaches to Side-Chain Prediction in Proteins with Known Backbone Structures
H. Farid
M.S. Thesis, Department of Computer Science, State University of New York at Albany, 1992

thesis not available on-line


Two algorithms are presented for prediction of side chain configurations in proteins with known backbone structures. The first algorithm predicts side chain configurations in small buried cores and the second predicts those configurations in entire proteins. The principle difficulty facing side chain prediction algorithms is the combinatorial explosion of simultaneously predicting even a small number (more than ten) of residues. One strategy to overcome this combinatorial explosion is to greatly reduce the number of possible configurations of each residue type to only a few rotamer positions; this methodology is adopted in the work presented here. In order to predict the side chain configurations in protein cores, the number of unfavorable van der Waals contacts and the rotamer probability are computed for every possible rotamer configuration. It is then shown that one of the few configurations (less than 99.9999% of all configurations) with a small number of unfavorable van der Waals contacts and high rotamer probability is the "best" rotamer model of the known native structure. For the eight cores studied, the "best" rotamer models of the known native structure have an average rms deviation of 0.477 Angstrom from their native structures. For entire proteins, it is computationally intractable to exhaustively search all possible rotamer configurations. Simulated annealing and Monte Carlo statistical sampling techniques are employed to overcome the combinatorial explosion of searching every rotamer configuration of an entire protein. The prediction algorithm "anneals" to a minimum "energy state", then a Monte Carlo sampling of 10,000 configurations with energies within one standard deviation of this minimum energy is taken. In order to predict side chain configurations, an entropy analysis of the Monte Carlo configurations sampled is performed. The entropy analysis assigns to each residue a measure of confidence in predictability; if this measure is above a certain threshold a prediction is made, otherwise no attempt is made to predict the side chain configuration of the residue. For the fifty proteins of known structure studied, on average, 62.8% of the side chain configurations are predicted with an accuracy of 71.7%


Related material Home     Papers     Research