import React from 'react';
import Footer from "./Footer";
const DownloadPage = () => {
  return (
    <div className="doc-page">
      <div>

        <h2>Downloading data and mmCIF files of interfaces or assemblies</h2>
        <p>
          The EPPIC web interface is powered by a <a href="https://eppic-rest.rcsb.org" target="_blank" rel="noreferrer">REST API</a>,
          that is also available publicly for data retrieval.
          Endpoints exist for the different data objects that can be retrieved: interfaces, assemblies, residues, multiple sequence alignments,
          mmCIF files per assembly or interface.
          Data can be obtained for either precomputed PDB ids or user jobs (use the long alphanumeric job id as the identifier).
          The data provided by REST services is offered in JSON format.
        </p>

        <p>
          The interface ids are those calculated by EPPIC from largest (1) to smallest (n). The assembly ids
          are sorted from lower stoichiometries to higher stoichiometries.
          Note that the downloaded mmCIF files have the b-factors column replaced
          by the corresponding sequence entropy values per residue. Chains that are transformed
          with a rotation operator (symmetry partners) are named with &lt;original_chain_id&gt;_&lt;operator_id&gt;.
        </p>

        <h2>Software and source code</h2>
        <p>
          The EPPIC web server is a web GUI to the EPPIC command line program,
          written in Java. The latest version of it is available <a href="../downloads/eppic.zip">here</a>. If you need to run it often
          or want to tweak the parameters we recommend that you use the command
          line version. It has been tested in Linux only but it should work
          also in MacOSX. <a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download" target="_blank" rel="noreferrer">Blast</a> and <a href="http://www.clustal.org/omega/" target="_blank" rel="noreferrer">Clustal
          Omega</a> are required for it to work.
        </p>
        <p>
          You will need Java 17 (or newer) to be able to run the command-line EPPIC
          program.
        </p>
        <p>
          The source code is <a href="https://github.com/eppic-team/eppic" target="_blank" rel="noreferrer">available</a>&nbsp;
          under the GPL license. You can get it with the following GIT command:
        </p>
        <p>
          <code>git clone https://github.com/eppic-team/eppic</code>
        </p>
        <p>
          EPPIC uses the open source <a href="http://www.biojava.org" target="_blank" rel="noreferrer">BioJava</a> library.
        </p>
        <p>
          Please <a href="mailto:info@rcsb.org">contact</a> us if you have
          problems with it or if you want to send any kind of feedback.
        </p>

        <h2>Datasets</h2>
        <p>
          The datasets used for developing the EPPIC method (see the <a href="http://www.biomedcentral.com/1471-2105/13/334" target="_blank" rel="noreferrer">paper</a>)
          can be downloaded as plain text files:
        </p>
        <ul>
          <li>
            <a href="https://github.com/eppic-team/datasets/blob/master/data/DCxtal.txt" target="_blank" rel="noreferrer">DCxtal set</a>: a set of
            crystal contacts with large interface areas (&gt;1000Å<sup>2</sup>)
          </li>
          <li>
            <a href="https://github.com/eppic-team/datasets/blob/master/data/DCbio.txt" target="_blank" rel="noreferrer">DCbio set</a>: a set of
            biologically relevant interfaces with relatively small interface
            areas (&lt;2000Å<sup>2</sup>)
          </li>
        </ul>
        <p>
          The area distributions of the DCxtal and DCbio interfaces, as seen in&nbsp;
          <a href="http://www.biomedcentral.com/1471-2105/13/334/figure/F1" target="_blank" rel="noreferrer">this plot</a>, overlap substantially. This is a
          distinctive feature of the sets, as crystal interfaces tend to be
          small and biologically relevant ones tend to be large. Also note that
          all entries in the sets are selected for crystallographic quality by
          resolution and Rfree filtering.
        </p>
        <p>
          The files contain lists of PDB codes with lists of interface
          identifiers as calculated by EPPIC, i.e. id 1 corresponding to
          largest interface in crystal, and increasing ids for smaller
          interfaces. If no interface id is given in a line then interface 1 is
          implied. Lines starting with "#" are comments.
        </p>
        <p>
          We further compiled (see <a href="http://www.biomedcentral.com/1472-6807/13/21" target="_blank" rel="noreferrer">paper</a>)
          a new dataset of experimentally validated transmembrane protein
          oligomeric structures. It can also be downloaded as text file here:
        </p>
        <ul>
          <li>
            <a href="https://github.com/eppic-team/datasets/blob/master/data/TMPbio.txt" target="_blank" rel="noreferrer">TMPbio set</a>: a set of
            biological interfaces spanning the transmembrane region, from both
            alpha and beta TMP subclasses
          </li>
        </ul>
        <p>
          We next automatically obtained two large-scale datasets of crystal and biological
          contacts, called XtalMany and BioMany, respectively
          (<a href="http://www.biomedcentral.com/1472-6807/14/22" target="_blank" rel="noreferrer">Baskaran et al. 2014</a>),
          which contain nearly 3000 entries each. XtalMany is based on the concept of operators
          leading to infinite assemblies. BioMany is mainly based on the concept of shared interfaces
          across crystal forms: it is a subset of <a href="http://dunbrack2.fccc.edu/protcid/" target="_blank" rel="noreferrer">ProtCID</a>&nbsp;
          from the Dunbrack group with very stringent parameters.
          In addition it contains interfaces from dimeric structures that were solved both
          by crystallography and NMR: here the idea is that an NMR dimer validates the dimeric biounit
          of the corresponding crystal structure.
          XtalMany and BioMany can be downloaded as text files here:
        </p>
        <ul>
          <li>
            <a href="https://github.com/eppic-team/datasets/blob/master/data/ManyXtal.txt" target="_blank" rel="noreferrer">XtalMany set</a>: a large
            dataset of crystal contacts
          </li>
        </ul>
        <ul>
          <li>
            <a href="https://github.com/eppic-team/datasets/blob/master/data/ManyBio.txt" target="_blank" rel="noreferrer">BioMany set</a>: a large
            dataset of biological interfaces
          </li>
        </ul>
        <p>
          In <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006104" target="_blank" rel="noreferrer">Bliven et al. 2018</a> we
          used a dataset of assemblies extracted from the bioassembly annotations in the PDB (only the 1st bioassembly "PDB1" was used).
          Bioassemblies with good consensus within their 70% sequence clusters were taken, see full details in&nbsp;
          <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006104" target="_blank" rel="noreferrer">paper</a>.
        </p>
        <ul>
          <li>
            <a href="https://github.com/eppic-team/datasets/blob/master/data/benchmark-pdb1_clusters.csv" target="_blank" rel="noreferrer">
              Consensus assemblies (PDB1 clusters)
            </a>
          </li>
        </ul>
        <p>
          Please note that the original publications also contain the datasets including
          our full annotations. However we cannot update those if we find any mistakes.
          The datasets linked here represent the most up-to-date and best
          validated sets. Please use these ones preferentially to the ones available in the
          original publications.
        </p>
      </div>
      <Footer fixbottom={false}/>
    </div>
  );
};

export default DownloadPage;