venn diagram of -ome intersections
A collaboration of

powered by EpiMatrix

CHOPPI integrates a number of data sources — genome, transcriptome, secretome, and immunome — to identify immunogenicity risks from host contaminant proteins (HCPs) in CHO-based protein production.

CHOPPI is described in:

C. Bailey-Kellogg, A.H. Gutierrez, L. Moise, F. Terry, W. Martin, A. S. De Groot, "CHOPPI: a web tool for the analysis of immunogenicity risk from host cell proteins in CHO-based protein production", Biotechnology & Bioengineering, 2014, in press.

Please cite this paper if you use CHOPPI for your work.

Earlier CHOPPI-based analysis is presented in:

A.H. Gutierrez, L. Moise, F. Terry, K. Dasilva, C. Bailey-Kellogg, W. Martin, A. S. De Groot, "Immunoinformatic Analysis of Chinese Hamster Ovary (CHO) Protein Contaminants in Therapeutic Protein Formulations", Immunoinformatics and Computational Immunology Workshop, 2012.

The CHOPPI web server was developed by Chris Bailey-Kellogg in collaboration with Annie De Groot, Bill Martin, and the other collaborators listed on the paper above. Thanks to EpiVax for use of EpiMatrix and whole-protein immunogenicity evaluation.

We welcome all feedback on how to improve the site and help you apply it to your work.


Search for a CHO protein by its name, id (gi or gb), or amino acid sequence (BLAST).


Identify proteins in intersections of the various -omes. Percent identity is to the closest homolog in the specified set, such that the alignment includes at least the percent coverage of the query protein.

Protein analysis

A protein page (e.g., beta-2-microglobulin) provides a full analysis of a protein:

id (gi and gb)
We have analyzed CHO K1 [bioproject 69991] and 17A/GY [bioproject 189319]. As protein sequences become available for other strains, we will include them too [alert us].
To focus on proteins expressed by CHO, we have compared the genes against contigs in one of the CHO transcriptome projects [bioproject 66543].
To further focus on proteins that have been translated, we have compared the genes against the sequences (both proteins and glycoproteins) identified by the proteomic analysis of Baycin-Hizal et al.
mouse secretome
To focus on the proteins likely to be secreted, we have compared the genes against mouse secreted proteins identified in the LOCATE database and UniProt.
As a complementary assessment of secretion, we employed SignalP (v4.0, default settings) to identify which genes have predicted signal peptides.
validated HCPs
We have collected from the literature some experimentally identified CHO host contaminant proteins. You may submit more for us to include.
To check homology with the human genome, we have compared against the Uniprot Reviewed database (downloaded 2012-10-22).
immunogenicity score
These are on a scale such that numbers below -20 indicate low risk of immunogenicity and those above 20 high risk.
epitope analysis
We provide a summary of the MHC class 2 epitopes predicted by EpiMatrix within the protein.
In addition to listing the total number of immunogenic 9mers, we also separate the count of 9mers that have a degenerate human 9mer and those that don't. Those without a human counterpart are more likely to pose an immunogenicity risk.
Finally, we provide the percentage of 9mer frames (# amino acids - 8) that are in epitopes or in unique-to-CHO epitopes. This helps calibrate epitope "density".