In developing improved protein variants by site-directed mutagenesis or recombination, there are often competing objectives that must be considered in designing an experiment (selecting mutations or breakpoints): stability vs. novelty, affinity vs. specificity, activity vs. immunogenicity, and so forth. Pareto optimal experimental designs make the best trade-offs between competing objectives. Such designs are not "dominated"; i.e., no other design is better than a Pareto optimal design for one objective without being worse for another objective. Our goal is to produce all the Pareto optimal designs (the Pareto frontier), in order to characterize the trade-offs and suggest designs most worth considering, but to avoid explicitly considering the large number of dominated designs. The figure above shows site-directed recombination designs selecting 10 breakpoints to trade off between predicted effects on stability (y axis) and diversity (x axis), each assessed by a potential function to be minimized. The design optimizing stability (lower right) places all the breakpoints together, so that most of the structure is maintained, while that optimizing diversity (upper left) spreads them out. Designs between those two extremes make different trade-offs. There are a total of 263 choose 10 = 3.66*1017 designs for 10 breakpoints in this 263-residue protein, so clearly we must find the Pareto optimal ones without enumerating the others.

We have developed a divide-and-conquer algorithm, Pepfr (Protein Engineering Pareto Frontier), that hierarchically subdivides the objective space, employing appropriate dynamic programming or integer programming methods to optimize designs in different regions. This divide-and-conquer approach is efficient in that the number of divisions (and thus calls to the optimizer) is directly proportional to the number of Pareto optimal designs. We have demonstrated Pepfr with three protein engineering case studies: site-directed recombination for stability and diversity via dynamic programming (using the inputs of Zheng et al.), site-directed mutagenesis of interacting proteins for affinity and specificity via integer programming (using the inputs of Grigoryan et al.), and site-directed mutagenesis of a therapeutic protein for activity and immunogenicity via integer programming (using the inputs of Parker et al.). Pepfr is able to effectively produce all the Pareto optimal designs, discovering many more designs than previous methods. The characterization of the Pareto frontier provides additional insights into the local stability of design choices as well as global trends leading to trade-offs between competing criteria.

Our approach is described in the paper "A divide and conquer approach to determining the Pareto frontier for optimization of protein engineering experiments", by Lu He, Alan M. Friedman, and Chris Bailey-Kellogg, Proteins 2012. This page provides a repository of the code, instructions, and case study inputs, so that you may use Pepfr in your own protein engineering applications. If your application has the same form / objectives as those we have already coded, you should simply be able to provide the new input files. If you need to modify the objectives or implement a new objective, you will have to extend the C++ classes. We provide instructions for each of these cases. We hope that you find it easy to install and run, but please contact Chris Bailey-Kellogg with any questions or problems.


Proteins. 2012 Mar;80(3):790-806.doi: 10.1002/prot.23237. Epub 2011 Dec 16.
A divide-and-conquer approach to determine the Pareto frontier for optimization of protein engineering experiments.
He L, Friedman AM, Bailey-Kellogg C.
PMID: 22180081