FastPLMs provides optimized, HuggingFace-compatible implementations of protein language models (PLMs) with pluggable attention backends.
FastPLMs/
fastplms/ # Main package
esm2/ # ESM2 (Meta AI)
esm_plusplus/ # ESM++ / ESMC (EvolutionaryScale)
e1/ # E1 (Profluent Bio)
dplm/ # DPLM (ByteDance)
dplm2/ # DPLM2 (ByteDance)
boltz/ # Boltz2 (structure prediction)
esmfold/ # ESMFold (structure prediction)
attention.py # Shared attention backend code
embedding_mixin.py # Shared pooling & embedding utilities
weight_parity_utils.py # Weight comparison utilities
fine_tuning_example.py # LoRA fine-tuning example
official/ # Official reference repos (git submodules)
boltz/ # Official Boltz
e1/ # Official E1
dplm/ # Official DPLM
transformers/ # Official HF transformers
entrypoint_setup.py # PyTorch runtime config
testing/ # Test suite + benchmarks
official/ # Official model loaders for compliance
docs/ # Documentation
Each model family lives in its own package directory containing:
| File | Purpose |
|---|---|
modeling_*.py |
HuggingFace-compatible PreTrainedModel + PretrainedConfig subclasses |
get_*_weights.py |
Script to convert official checkpoints to FastPLM format |
README.md |
Per-model HuggingFace card README |
LICENSE |
Per-model license file |
__init__.py |
Package init (often minimal; models load via trust_remote_code) |
All FastPLMs models are distributed on the HuggingFace Hub and loaded with trust_remote_code=True:
from transformers import AutoModelForMaskedLM
model = AutoModelForMaskedLM.from_pretrained(
"Synthyra/ESM2-150M",
trust_remote_code=True,
)When trust_remote_code=True is passed, HuggingFace downloads the modeling_*.py file from the Hub repo and executes it locally. The Hub copy is kept in sync with the canonical copy in this repository via update_HF.py.
The model's config.json on the Hub contains an auto_map entry that tells AutoModel which class to instantiate:
{
"auto_map": {
"AutoConfig": "modeling_fastesm.FastEsmConfig",
"AutoModelForMaskedLM": "modeling_fastesm.FastEsmForMaskedLM"
}
}Every sequence model (ESM2, ESM++, E1, DPLM, DPLM2) inherits from EmbeddingMixin (fastplms/embedding_mixin.py), which provides:
embed_dataset(): Batch embedding pipeline with pooling, SQLite/pth storage, FASTA parsing, and deduplication_embed(): Abstract method implemented by each model to return last hidden statesload_embeddings_from_pth()/load_embeddings_from_db(): Load previously saved embeddings
The mixin supports two modes:
- Tokenizer mode (ESM2, ESM++, DPLM, DPLM2): The caller provides a tokenizer;
_embed(input_ids, attention_mask)is called - Sequence mode (E1): The caller passes
tokenizer=None;_embed(sequences, return_attention_mask=True)is called, which returns(embeddings, mask)
See Embedding & Pooling API for full details.
All models share a common attention backend abstraction controlled by config.attn_backend. Four backends are available:
| Backend | Key | Numerics | Speed |
|---|---|---|---|
| PyTorch SDPA | "sdpa" |
Exact | Fast |
| Flash Attention | "kernels_flash" |
Approximate | Fastest |
| Flex Attention | "flex" |
Near-exact | Very fast |
| Auto | "auto" |
Varies | Best available |
Each model's attention layer stores an AttentionBackend enum and dispatches accordingly. See Attention Backends for implementation details.
Backend setting differs by model family:
- ESM2, ESM++, E1: Set on the config before calling
from_pretrained - DPLM, DPLM2: Expose a mutable
model.attn_backendproperty that propagates to all layers
entrypoint_setup.py configures PyTorch runtime defaults for optimal GPU performance:
- TensorFloat32 matmul precision (
torch.set_float32_matmul_precision('high')) - TF32 enabled for matmul and cuDNN
- cuDNN autotuner (
benchmark=True) - Deterministic mode off for speed
- Inductor max autotune GEMM backends (ATEN, CUTLASS, FBGEMM)
- Dynamo scalar output capture and recompile limit
This module is imported at the top of standalone scripts (throughput.py, compliance.py) but is not imported by the model files themselves.
The Dockerfile uses:
- Base image:
nvidia/cuda:12.8.0-cudnn-runtime-ubuntu24.04with Python 3.12 - Source code: Copied to
/app(PYTHONPATH=/app) - Runtime workdir:
/workspacefor outputs, caches, and volume mounts - Caches:
HF_HOME=/workspace/.cache/huggingface,TORCH_HOME=/workspace/.cache/torch - Compliance deps: Official repos installed via
pip install -efromofficial/submodules
Each model family has a get_*_weights.py script that:
- Loads the official checkpoint (from HuggingFace or a local file)
- Remaps parameter names and shapes to match the FastPLM architecture
- Exports
config.json,pytorch_model.bin, and the modeling source files - The exported directory can be pushed to HuggingFace via
update_HF.py
Each family has a corresponding module in testing/official/ (e.g., testing/official/esm2.py) that wraps the original model in a standardized interface returning (model, tokenizer). This allows the compliance test suite to load both implementations side-by-side and compare:
- Weight parity: Bit-exact MSE comparison of state dicts
- Forward compliance: Logits MSE and prediction accuracy across random batches
See Testing & Benchmarking for details on running compliance tests.