O. Vitek, C. Bailey-Kellogg, B. Craig, P. Kuliniewicz, and J. Vitek, "Reconsidering complete search algorithms for protein backbone NMR assignment", Bioinformatics (Proc. ECCB), 2005, 21:ii230-236. [preprint]

Motivation: Nuclear magnetic resonance (NMR) spectroscopy is widely used to determine and analyze protein structures. An essential step in NMR studies is determining the backbone resonance assignment, which maps individual atoms to experimentally measured resonance frequencies. Performing assignment is challenging due to the noise and ambiguity in NMR spectra. While automated procedures have been investigated, by-and-large they are still struggling to gain acceptance due to inherent limits in scalability and/or unacceptable levels of assignment error.

To have confidence in the results, an algorithm should be complete, i.e., able to identify all solutions consistent with the data, including all arbitrary configurations of extra and missing peaks. The ensuing combinatorial explosion in the space of possible assignments has led to the perception that complete search is hopelessly inefficient and cannot scale to realistic data sets.

Results: This paper presents a complete branch-contract-and-bound search algorithm for backbone resonance assignment. The algorithm controls the search space by hierarchically agglomerating partial assignments and employing statistically sound pruning criteria. It considers all solutions consistent with the data, and uniformly treats all combinations of extra and missing data.

We demonstrate our approach on experimental data from 5 proteins ranging in size from 70 to 154 residues. The algorithm assigns over 95% of the positions with over 98% accuracy. We also present results on simulated data from 259 proteins from the RefDB database, ranging in size from 25 to 257 residues. The median computation time for these cases is 1 minute, and the assignment accuracy is over 99%.

These results demonstrate that complete search not only has the advantage of guaranteeing fair treatment of all feasible solutions, but is efficient enough to be employed effectively in practice.