python score_round.py path/to/modelsFolders my_output.csvWhere path/to/modelsFolders is a directory containing one or more model subdirectories, structured as follows:
path/to/modelsFolders/
├── 1a14/
│ ├── 1a14_001.pdb
│ ├── 1a14_002.pdb
│ └── ...
└── 5yy1/
├── 5yy1_001.pdb
└── 5yy1_002.pdb
Important: Each subfolder (e.g.
1a14,5yy1) must contain models from one system only (the same antibody and antigen). If you have multiple different systems (e.g. when testing ARID on solved structures), usescore_refs.pyinstead.
Note: ARID processes each subfolder independently and parallelizes tasks across all models within it. Performance is optimal when each subfolder contains thousands of models and multiple CPUs are available.
1. Clone the repository:
git clone https://github.com/DSIMB/ARID-sf.git
cd ARID-sf2. Create and activate a conda environment:
mamba create -n arid-env python=3.11 -c conda-forge
mamba activate arid-env3. Install dependencies:
mamba install -c conda-forge numpy=1.23.5 pandas=2.3.1 httpx cython matplotlib pyparsing biotite mdtraj4. Install PyTorch and CUDA (pytorch 2.8.0+cu128):
See pytorch.org/get-started/locally for the right command for your machine. For example:
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.05. Install ESM-C:
pip install esm==3.2.1Note: pip may report an error, but the required packages should install successfully.
6. Build the Cython extension:
cd ARIDv2.0
python setup_v2.py build_ext --inplace
cd ..7. Test your installation:
python3 ARIDv2.0/score_round.py example/models example/outputs/example_output.csvFor the program to work correctly, each file must respect the following requirements:
- The file must be in
.pdbformat - All antigen chains must be renamed
"A"(even if there are multiple chains) - All antibody chains must be renamed
"B"(even if there are multiple chains) - Antigen atoms/chains (
"A") must appear first in the file - Antibody atoms/chains (
"B") must appear second (after the antigen) in the file - Residues must be renumbered from
1toN, whereN = antigen residues + antibody residues - Atoms must follow the OPLS United Atom (UA) forcefield nomenclature (see below)
Make sure to run HADDOCK with the antigen chain as "A" and the antibody chain as "B".
If this is the case, you can run ARID-sf directly with HADDOCK3 models.
An example of running HADDOCK for Ab-Ag modelling.
Only polar hydrogens are present. Atom names and residue type names can be found in /src/ARIDv2.0/lookup_dict.py.
An easy way to convert an antibody-antigen complex .pdb file to OPLS UA format is to use the HADDOCK3 [topoaa] module. An example script is available at src/benchmark_sets/model_to_ua.py.
A set of helper scripts is available in ./formatting. These allow you to:
- Rechain and renumber PDB files
- Run the HADDOCK3 topology module to obtain OPLS UA nomenclature
- Parse HADDOCK3 output and copy ARID-sf-ready files into a target directory
For creaating correct topologies, these scripts require HADDOCK3.
A .csv file with the following structure:
Model,pDockQ,pFNAT_env,pJACA,pJACB
1A14_1_4908_R_4908,0.29489622,0.503456,0.65096414,0.61525863
1A14_1_5389_R_5389_ti,0.11942382,0.34318617,0.44313583,0.4278602
...
- Model: the scored file name
- pDockQ: the predicted ARID-sf score
- A score close to 1 indicates a good prediction
- A score close to 0 indicates an incorrect prediction
To customize the computation, edit the following variables in score_round.py:
| Parameter | Default | Description |
|---|---|---|
n_workers |
40 |
Number of CPU workers |
batch_size |
1000 |
Number of models per batch |
feature_cap |
1000 |
Feature boundaries in case of clashes — do not change |
cap_length |
75 |
Max number of residues considered per model interface — do not change |
CAP_MEMORY |
1000 |
Number of models processed before writing to disk (RAM-dependent) |