J. Li, Z.-P. Yi, M.C. Laskowski, M. Laskowski Jr., and C. Bailey-Kellogg, "Analysis of sequence-reactivity space for protein-protein interactions", Proteins, 2005, 58:661-671. [paper]

Sequence-reactivity space is defined by the relationships between amino acid type choices at some residue positions in a protein and the reactivities of the resulting variants. We are studying Kazal superfamily serine proteinase inhibitors, under substitution of any combination of residue types at ten binding-region positions. Reactivities are defined by the standard free energy of association for an inhibitor against an enzyme, and we are interested in both the strength (the free energy value) and specificity (relative free energy values for one inhibitor against different enzymes). Characterizing the structure of such a space poses several interesting questions: (1) how many sequences achieve particular strength and specificity characteristics? (2) what is the best such sequence? (3) what are some nearly-as-good alternatives? (4) what are their common residue type characteristics (e.g. conservation and correlation)? Although these problems are all highly combinatorial in nature, this paper develops an efficient, integrated mechanism to address them under a data-driven model that predicts reactivity for given sequences. We employ sampling and a novel deterministic distribution propagation algorithm, in order to determine both the reactivity distribution and sequence composition statistics; integer programming and a novel branch-and-bound search algorithm, in order to optimize sequences and enumerate near-optimal sequences; and correlation-based sequence decomposition, in order to identify sequence motifs. We demonstrate the value of our mechanism in analyzing the Kazal superfamily sequence-reactivity space, providing insights into the underlying biochemistry and suggesting hypotheses for further experimental consideration. In general, our mechanism offers a valuable tool for investigating the available degrees of freedom in protein design within a combined computational-experimental framework.