D. Verma, G. Grigoryan, C. Bailey-Kellogg, "OCoM-SOCoM: Combinatorial Mutagenesis Library Design Optimally Combining Sequence and Structure Information", Proc. ACM-BCB, 2016.

High-throughput screening of combinatorial mutagenesis libraries has enabled and accelerated the development of beneficial protein variants. However, since even the most massive library contains only a miniscule fraction of the possible variants of a target protein, it is advantageous to use computational methods to design a library for enrichment in beneficial variants. Here we develop a general-purpose method, OCoM-SOCoM, that designs large combinatorial libraries simultaneously for both the sequence properties (OCoM) and the structural properties (SOCoM) of their constituent variants. Our algorithm optimizes library designs along the continuum of optimal trade-offs (the Pareto frontier) between library-based scores efficiently assessing expected structure-based energy (via Cluster Expansion analysis) vs. sequence-based evolutionary acceptability (via statistical analysis of homologs). Significantly, while the combinatorics of library design are even worse than individual protein design, in practice OCoM-SOCoM requires only hours even to optimize 10^9-member libraries (for 30 sites) within even larger library design spaces. Case study applications to green fluorescent protein and beta-lactamase provide insights into how sequence-structure relationships drive differences in library composition, and demonstrate that OCoM-SOCoM libraries incorporate mutations previously experimentally found to be valuable while also providing novel and diverse mutational combinations.