Graphical models of proteins

Protein sequences are constrained both at individual residues (conservation) and in relation to each other (covariation). Selective pressures to maintain structure and function have constrained sequences over time and across species. Constraints thus manifested in sequence-structure-function relationships can be inferred from the evolutionary record, along with information from available structural studies and functional assays. Identified relationships can then be employed in all different `directions', e.g., to predict function from the sequence of a newly-discovered protein, discriminate predicted structures for a sequence according to functional tests, and design variant (homologous) protein sequences with related functions.

We are developing approaches to learn and use probabilistic graphical models (aka Markov random fields) that capture significant conservation and coupling observable in a multiply-aligned set of sequences. By incorporating structural information, our models can provide mechanistic explanations for observed constraints. By incorporating functional class information, they can perform interpretable classification of new sequences, explaining decisions in terms of the underlying conservation and coupling constraints. By incorporating information about interacting proteins, they can identify "cross-coupling" constraints and make explainable predictions about novel interactions. Finally, the models can be used generatively, to design new sequences consistent with the modeled constraints, and thus predicted to be folded and functional.

Papers

  • J. Thomas, N. Ramakrishnan, and C. Bailey-Kellogg, "Graphical models of protein-protein interaction specificity from correlated mutations and interaction data", Proteins, in press. abstract.
  • J. Thomas, N. Ramakrishnan, and C. Bailey-Kellogg, "Protein design by sampling an undirected graphical model of residue constraints", IEEE/ACM Transactions on Computational Biology and Bioinformatics, in press. abstract. preprint.
  • J. Thomas, N. Ramakrishnan, and C. Bailey-Kellogg, "Graphical models of residue coupling in protein families", IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2008, 5:183-197. abstract. official version. preprint.
  • J. Thomas, N. Ramakrishnan, and C. Bailey-Kellogg, "Graphical models of residue coupling in protein families", Proc. BioKDD, 2005. abstract. pdf. Copyright 2005, ACM.

Current projects