Help

EPPIC (Evolutionary Protein-Protein Interface Classifier) aims at predicting the quaternary structure of proteins from crystal structures. It first classifies the interfaces present in a crystal structure to determine whether they are biologically relevant or not. Then it enumerates all topologically valid assemblies in the crystal resulting from different combinations of the interfaces. Finally, it provides a prediction of the most likely quaternary assembly based on the individual interfaces' scores.

In addition to that, it provides information of general use for a number of structural biology applications:

Precomputed Multiple Sequence Alignments (MSAs) of closely related homologs (within 60% sequence identity) for every protein in the PDB or for user-uploaded structures
Number and distribution of core residues in interfaces
Symmetry features of interfaces: the crystallographic operators generating each interface are provided and are depicted in red if they are conducive to infinite assemblies. Also, the isologous character of each interface cluster is indicated with icon . Absence of the icon indicates that the interface cluster is heterologous.

Inputting data

Screenshot of the input box — The input box and dropdown

The server has precomputed results for all crystallographic structures in the PDB. If you are interested in a specific PDB id, you can simply enter it and get results immediately. You can also submit your own protein structure by uploading a PDB/mmCIF file (use the drop-down to select "File upload"). If you upload your own file, the calculation will be triggered and the results are produced within a few minutes. Structures with many protein entities (unique sequences) will take longer.

Every new job run on the server is assigned a unique job identifier, a long alphanumerical string that is only known by the user who submits it and that is very hard to guess. This guarantees the privacy of your data (more).

The assemblies table

All the topologically valid assemblies detected in the crystal are listed. Valid assemblies are those that exhibit Point Group Symmetry and that are isomorphous across the unit cell, i.e. all molecules are interacting in the same way throughout the lattice.

Each assembly is represented by a thumbnail image in fat ribbon representation with different subunits in different colors (1st column). Also a 2-dimensional graph diagram of the assembly is shown, with nodes being the chains and edges the interfaces between them (2nd column). Distinct molecular entities and distinct interfaces are depicted with different colors. Next appear the macromolecular size, stoichiometry (the successive letters represent different molecular entities, not chain ids) and point group symmetry (Cn for cyclic symmetries, Dn for dihedral symmetries, T for tetrahedral, O for octahedral and I for icosahedral).

Note that disjoint assemblies will be shown separated by commas in the size, symmetry, and stoichiometry columns. Disjoint assemblies are those where not all components of the crystal form a single assembly but instead several disjoint ones, e.g. a crystal containing 2 protein entities will always contain a disjoint assembly formed by 2 independent monomers (one for each entity).

The prediction column provides the predicted assembly (marked as BIO), while all others are marked as XTAL. The predicted assembly corresponds to that with the highest calculated probability, based on the scores of the interfaces that form it. The probability values for each assembly appear next to the call. An estimation of the prediction confidence is provided with star icons: golden star for high confidence, gray star for medium confidence, no star for low confidence.

The last column shows how many interfaces compose the assembly. Clicking on the button, the interfaces table is shown with only those interfaces belonging to the assembly. The view can be reset to show all interfaces by clicking on the close icon in the tab title.

The interfaces table

This table provides a view of all the pairwise protein-protein interfaces present in the crystal. Clusters of similar interfaces are shown in groups of rows. Interface clusters are interfaces that share a certain amount of contact similarity. For each cluster, a header row displays the cluster id, number of member interfaces and the members' average area.

An assessment of the biological relevance of each pairwise interface is provided, based on a geometrical and evolutionary criteria. The final assessment that is a composite of the other 2 is provided in the right-most column.

geometry: number of core residues (at 95% burial), indicating how good the packing in the interface is. Note this score is not shown in the user interface but it is taken into account for the final call.
core-surface score: a z-score of sequence entropy of core residues (at 70% burial) versus random samples of all surface residues

Screenshot of the main interface table — The interfaces table

Each of these indicators have predefined score thresholds to produce one of the calls:

BIO, the interface is biologically relevant
XTAL, the interface is only a crystal lattice contact
NOPRED, there is not enough information available to make a decision (usually not enough sequence data)

You can see the scores for each of the indicators next to the bio/xtal/nopred labels.

The two scores are used to calculate a final score and a probability of the interface being biologically relevant (1 being certainly biological, 0 certainly crystal contact). The call and the probability appear in the "Final" column. BIO will mean that the probability is above 0.5 and XTAL that the probability is below 0.5. This is the final prediction column and what you need to look at first. An estimated confidence level for the prediction is depicted with stars, golden star for high confidence, gray star for medium confidence, no star for low confidence.

The other columns in the interface table correspond to a few important parameters describing the interfaces: the two chain codes of the partners (e.g. "A+B"), the Buried Surface Area upon interface formation (interface sorting is based on this value), the icon of the crystal operator used to generate the second partner of the interface

The operators are represented as icons to show at a glance what kind of crystallographic symmetry is present at the interface. The actual full algebraic operator (e.g. "-X+1,Y-1/2,-Z") can still be seen by hovering the mouse over the icon. The icons used for the operators are mostly the standard ones found in crystallographic tables: the identity operator (i.e. an interface in the asymmetric unit), a crystal translation (integer) without rotation, a re-centering translation without rotation, a 2-fold axis, a 2-fold screw axis, a 3-fold axis, a 3-fold screw axis, a 4-fold axis, a 4-fold screw axis, a 6-fold axis, a 6-fold screw axis.

For the rare cases where a protein is crystallized in non-chiral space groups (e.g. racemic mixtures) there are additional operators: an inversion centre, a mirror plane, a glide plane, an improper 3-fold axis, an improper 4-fold axis, an improper 6-fold axis.

Some of these operators lead to the formation of infinite interfaces if occurring between two crystallographically-related copies of the same molecule (e.g. A+A). This happens for both the pure translations and for any of the screw rotations and it is generally a very strong indication of a crystal contact. In those cases, we color the operator icon in red. The final call does not take that information into account, but this is very important for the enumeration of valid assemblies in the crystal shown in the assemblies table.

Viewing interfaces and assemblies in 3D

The thumbnails in the Assembly and Interface tables give a visual cartoon representation of the assemblies and interfaces. By clicking on them one gets an interactive 3D view with the Mol* viewer. In the interface view the two protomers are represented as cartoons with interface residues also shown as sticks. Core residues from both protomers are shown in two different shades of red. The sequence entropy values for each residue are written as b-factors in the .cif file.

The lattice graph

The assemblies are analysed through a graph representation of the chains and interfaces. This lattice graph is a periodic graph, with chains being the nodes and interfaces the edges. Visual tools to look at the graph are provided. They can be useful in understanding the crystal packing and the different possible assemblies that can be constructed with the given connectivity.

2D lattice graph view (visjs): the graph is shown in a dynamic 2D layout. Different colors are assigned to different molecular entities and to distinct interfaces. This view can be obtained by clicking on any of the assembly diagram thumbnails in the assemblies table.

The sequences table

The header of the Sequence Information tab shows which version of the UniProt database version is used to find homologs for the EPPIC multiple sequence alignments. A new UniProt database release appears a few times a year containing more and more sequences. With more sequences we can obtain better predictions (see this figure), thus the growth of the UniProt sequence database has quite an important effect on the accuracy of our method. We will try to maintain the results as up-to-date as possible and to update the PDB-wide precomputed results every month (for every UniProt update). In any case if you use our results it is important to quote the UniProt database version used.

The sequences table provides information about the sequence homologs in the Multiple Sequence Alignment (MSA) used for entropy calculation. This information is given for all unique chains (protein polymer entities) in the structure. More details of the sequence of a particular homolog can be found by clicking the UniProt link.

Note that for the MSA calculation sequences are clustered so that no pair of sequences are more similar than a certain threshold. The link in the right-most column enables you to download the MSA of all homolog sequences (FASTA format). You will need an alignment viewer like Jalview to have it nicely displayed.

Known issues

MHC and antibody interfaces not correctly predicted: due to the special nature of the MHC and antibodies sequences, the evolutionary criteria used by EPPIC do not hold for them. Thus the predictions for interfaces with at least one MHC or antibody molecule will often be incorrect.

Job identifiers

Every new job run on the server is assigned a unique job identifier, a long alphanumerical string that is only known by the user that submits it and that is very hard to guess. It is recommended that you give an email address while submitting so that you receive the URL with the job identifier in your inbox. Otherwise you will have to bookmark it or keep a record of it yourself. It is always possible to retrieve the job by using the URL https://www.eppic-web.org/assemblies/<my_job_id>. Whether the job is still running or already done, the URL will show its current status and automatically display the final results whenever it is finished. To share the results of a job with colleagues just send them the corresponding URL. The jobs will be stored in our servers for 1 month and then deleted.

The PDB-wide precomputed results can be accessed directly by using the permanent URLs:

https://www.eppic-web.org/assemblies/<PDB_code>

Funding

Funding to the project came initially from the Paul Scherrer Institute (2010-2014) and later from the Swiss National Science Foundation (2013-2016). Since 2016 the RCSB Protein Data Bank has supported the project and enabled its continuation. The RCSB PDB is funded by a grant (DBI-1338415) from the National Science Foundation, the National Institutes of Health, and the US Department of Energy.