General

Q: What does EPPIC stand for?

Evolutionary Protein-Protein Interface Classifier.

Q: What can I use EPPIC for?

EPPIC mainly aims at predicting the likely quaternary structure of protein crystals. It bases the predictions on evolutionary scoring of the pairwise interfaces present in the crystal. In addition to that it provides information of general use for a number of structural biology applications:

  • Precomputed Multiple Sequence Alignments (MSAs) of closely related homologs (within 60% sequence identity) for every protein in the PDB or for user-uploaded structures
  • Number and distribution of core residues in interfaces
  • Symmetry features of interfaces: the crystallographic operators generating each interface are provided and are depicted in red if they are conducive to infinite assemblies. Also, the isologous character of each interface cluster is indicated with icon . Absence of the icon indicates that the interface cluster is heterologous

Q: Is it open-source? Is it published open-access?

EPPIC is being developed as a fully open-source tool in the frame of the SNF-funded, use-inspired project "Molecular evolution for structural biology: analyzing and predicting protein-protein interactions". Since 2012 we have been following an open-access policy with regard to publications related to this project (see Publications page).

Q: How should I cite EPPIC?

The main paper describing the EPPIC method is Bliven S, Lafita A, Parker A, Capitani G, Duarte JM, PLoS Computational Biology 2018. As well as citing the paper it is important that one cites the EPPIC software version (top right below the logo in results page) and even more important the UniProt database version (monthly releases) used for the evolutionary analysis. Our predictions depend on the sequence homologs found by searching the UniProt database so with every release of the database the predictions can change. UniProt is growing very fast these days so a few months of UniProt releases between two runs can make a difference.

Q: How do I know if a prediction is reliable? Does EPPIC produce confidence values?

EPPIC produces estimation of confidences both for assemblies and for interfaces. There are 3 levels of prediction confidence: high confidence, depicted with a golden star ; medium confidence, depicted with a gray star ; low confidence, depicted with no star.

As well as the confidence values, there are a few useful indicators to understand if the prediction can be trusted or not:

  • The more sequence homologs the better: look at the number of homologs on the Sequences table. If it is below 10 EPPIC will not even produce the evolutionary scores and the prediction will be purely geometrical (less reliable). If it is above 10 but not much larger than that one should be careful with the results.
  • The sequence homologs should span a wide range of identities: look at their distribution by looking at the MSA. A typical case is that of some microbial proteins where only very close sequence homologs exist (>90%) and then nothing else is found in sequence databases until <40% identity. The resulting alignment will contain less information, thus the prediction will be less reliable.
  • The stronger the consensus the better: if the two indicators are unanimous in their calls the prediction can be considered more reliable.
  • Non full-length proteins (domains or fragments) are less reliably predicted.

Usage

Q: When submitting my own PDB/mmCIF file, can I later access the results for the job? How about data privacy?

Every new job run on the server is assigned a unique job identifier, a long alphanumeric string that is only known by the user and that is extremely hard to guess, thus safeguarding data privacy. The jobs and input data are stored on our servers for 1 month and then deleted. It is recommended that you give an email address while submitting so that you receive the URL with the job identifier in your inbox. Otherwise, you will have to bookmark it or keep a record of it yourself. It is always possible to retrieve the job by using the URL https://www.eppic-web.org/assemblies/<my_job_id>. Whether the job is still running or already done, the URL will show its current status and automatically display the final results whenever it is finished. You can even share the URL with colleagues.

Q: Can I directly link to the precomputed results for a PDB entry?

The PDB-wide precomputed results can be accessed directly by using the permanent URLs: https://www.eppic-web.org/assemblies/<PDB_code>

Q: What does the UniProt version on the Sequence Table tab label indicate?

The UniProt version (e.g. 2013_11) denotes the UniProt version used to calculate the results of EPPIC. UniProt is used to blast for homologs of the input structure, which are then used to calculate entropy scores. These homologs can also slightly change from one version of UniProt to the next (monthly releases), so the results can vary over time.