Batch inference of protein structure

Run AlphaFold3 on Euler at scale with data pipeline (MSA), and structure prediction steps parallelised across nodes. As an example, the e. coli reference proteome has 4,402 monomers. The data pipeline steps took 2 days with up to 500 CPU jobs running simultaneously. The structure prediction steps took ~4 hours with ~15 GPU jobs running simultaneously. A small number of inputs failed/had to be re-run.

Data pipeline runs on CPU-only nodes, each input as a separate job. Runtime per input ranges from an hour to a few days. Jobs that run out of RAM/runtime automatically re-start with increased resources.
Structure prediction runs on nodes with an A100 GPU, typically taking minutes per input. The runtime is predictable from the number of input tokens. We can use this to group inputs by size, and run one structure prediction job per group. This minimizes model startup, recompilation, and job scheduler waiting time.
Uses local scratch, compresses input/output with gzip (~5x space/traffic reduction).
Can use monomer data pipeline output to generate the input for multimer structure prediction. This can speed up interaction screens, e.g. protein-protein or protein-ligand...

Quick start

This will run AlphaFold3 for all input .json files in results/alphafold3_adhoc_examples/alphafold3_jsons/

Clone the repository:

cd /cluster/scratch/$USER
git clone --recurse-submodules https://github.com/jurgjn/batch-infer.git
cd batch-infer

Edit results/alphafold3_adhoc_examples/config.yaml to locate your AlphaFold3 model parameters. These are obtained from DeepMind on a per-user basis.

Start the pipeline with:

./batch-infer alphafold3_onegpu results/alphafold3_adhoc_examples | sbatch

See alphafold3_adhoc_examples.ipynb for a more detailed walk-through.

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
results		results
software		software
workflow		workflow
.gitignore		.gitignore
.gitmodules		.gitmodules
NOTES.md		NOTES.md
README.md		README.md
batch-infer		batch-infer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Batch inference of protein structure

Quick start

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Batch inference of protein structure

Quick start

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages