T. Wang, A.N. Kettenbach, S.A. Gerber, and C. Bailey-Kellogg, "MMFPh: A Maximal Motif Finder for Phosphoproteomics Datasets", Bioinformatics, 2012, 28:1562-1570. [pubmed] [paper]

MOTIVATION: Protein phosphorylation, driven by specific recognition of substrates by kinases and phosphatases, plays central roles in a variety of important cellular processes such as signaling and enzyme activation. Mass spectrometry enables the determination of phosphorylated peptides (and thereby proteins) in scenarios ranging from targeted in vitro studies to in vivo cell lysates under particular conditions. The characterization of commonalities among identified phosphopeptides provides insights into the specificities of the kinases involved in a study. Several algorithms have been developed to uncover linear motifs representing position-specific amino acid patterns in sets of phosphopeptides. To more fully capture the available information, reduce sensitivity to both parameter choices and natural experimental variation, and develop more precise characterizations of kinase specificities, it is necessary to determine all statistically significant motifs represented in a dataset.

RESULTS: We have developed MMFPh (Maximal Motif Finder for Phosphoproteomics datasets), which extends the approach of the popular phosphorylation motif software Motif-X (Schwartz and Gygi, 2005) to identify all statistically significant motifs and return the maximal ones (those not subsumed by motifs with more fixed amino acids). In tests with both synthetic and experimental data, we show that MMFPh finds important motifs missed by the greedy approach of Motif-X, while also finding more motifs that are more characteristic of the dataset relative to the background proteome. Thus MMFPh is in some sense both more sensitive and more specific in characterizing the involved kinases. We also show that MMFPh compares favorably to other recent methods for finding phosphorylation motifs. Furthermore, MMFPh is less dependent on parameter choices. We support this powerful new approach with a web interface so that it may become a useful tool for studies of kinase specificity and phosphorylation site prediction.