D. Verma, G. Grigoryan, and C. Bailey-Kellogg, "Pareto optimization of combinatorial mutagenesis libraries", IEEE/ACM Trans. Comp. Biol. Bioinf., 2018. [pubmed]

In order to increase the hit rate of discovering diverse, beneficial protein variants via high-throughput screening, we have developed a computational method to optimize combinatorial mutagenesis libraries for overall enrichment in two distinct properties of interest. Given scoring functions for evaluating individual variants, POCoM (Pareto Optimal Combinatorial Mutagenesis) scores entire libraries in terms of averages over their constituent members, and designs optimal libraries as sets of mutations whose combinations make the best trade-offs between average scores. This approach enables general-purpose, rigorous, and very fast optimization of large libraries (e.g., 30 mutation, billion-member libraries in only hours). We here instantiate POCoM with scores based on a target's protein structure and its homologs' sequences, enabling the design of libraries containing variants balancing these two important yet quite different types of information. We demonstrate POCoM's generality and power in case study applications to green fluorescent protein, cytochrome P450, and ?-lactamase. Analysis of the POCoM library designs provides insights into the trade-offs between structure- and sequence-based scores, as well as the impacts of experimental constraints on library designs. POCoM libraries incorporate mutations that have previously been found favorable experimentally, while diversifying the contexts in which these mutations are situated and maintaining overall variant quality.