Fast, multi-layer protein language model embeddings extractor for ESM-2 and ESM-C.
Extract mean-pooled residue embeddings from any ESM model in a single forward pass, with first-class support for multi-layer extraction, bfloat16 weights, Flash Attention 2, and SLURM array jobs.
Note
This library grew out of several personal research projects and working scripts, consolidated, migrating, compiling and refactored into one place (Hence the quick repo publication time). It is not meant to be a general-purpose ESM SDK but the scope is intentionally narrow: efficient, multi-layer embedding extraction for large-scale representation experiments. If something looks familiar, it probably is. PRs are welcome :)
- Models
- Installation
- Quick Start
- Scripts
- Output Format
- Default Ablation Layer Sets
- Performance Notes
- Pooling Convention
- Tests
- Citation
- License
| Model | Family | Layers | Embedding dim | Source |
|---|---|---|---|---|
esm2_8M |
ESM-2 | 6 | 320 | HuggingFace |
esm2_35M |
ESM-2 | 12 | 480 | HuggingFace |
esm2_150M |
ESM-2 | 30 | 640 | HuggingFace |
esm2_650M |
ESM-2 | 33 | 1280 | HuggingFace |
esmc_300m |
ESM-C | 30 | 960 | evolutionaryscale/esm → BioHub |
esmc_600m |
ESM-C | 36 | 1152 | evolutionaryscale/esm → BioHub |
Note
The initial code was based on the evolutionaryscale/esm SDK. Since the transfer of ESM-C weights to BioHub there may be some divergence. If you encounter any discrepancies please open an issue or submit a pull request.
Requires Python ≥ 3.13 and uv.
git clone https://github.com/Aaryesh-AD/esm-embed
cd esm-embed
uv sync
source .venv/bin/activateVerify the install:
# Run using uv
uv run esm-embed
# or directly
esm-embed --help
esm-embed modelsNo GPU? Everything works on CPU — expect ~50–200× slower embedding. ESM-C weights are downloaded from BioHub on first use and cached under
~/.cache/esm/. No API token is required.
# Embed a FASTA (default model: esm2_650M, default layer)
esm-embed embed proteins.fasta
# Specific model + layer
esm-embed embed proteins.fasta --model esm2_150M --layer 24 --output out.npy
# Multi-layer ablation (saves one .npy per layer)
esm-embed embed proteins.fasta \
--model esm2_650M \
--layers 9,18,27,30,33 \
--output-dir ./embeddings/
# ESM-C
esm-embed embed proteins.fasta --model esmc_300m --batch-size 8
# bfloat16 + Flash Attention 2 (ESM-2 only)
esm-embed embed proteins.fasta --model esm2_650M --half --sdpa
# List models / inspect ablation layers
esm-embed models
esm-embed info esm2_650Mfrom esm_embed import embed, embed_multilayer
seqs = ["ACDEFGHIKLMNPQRSTVWY", "MKTIIALSYIFCLVFA"]
# Single-layer (last recommended layer per model)
embs = embed(seqs, model="esm2_650M")
print(embs.shape) # (2, 1280)
# Multi-layer: all ablation layers in one forward pass
layer_embs = embed_multilayer(seqs, model="esm2_650M")
# {9: (2, 1280), 18: (2, 1280), 27: (2, 1280), 30: (2, 1280), 33: (2, 1280)}
# Custom layers
layer_embs = embed_multilayer(seqs, model="esm2_650M", layers=[18, 33])from esm_embed import ESM2Embedder
embedder = ESM2Embedder(
"esm2_650M",
half=True, # bfloat16 weights: halves VRAM, ~21% faster
use_sdpa=True, # Flash Attention 2: 2–4× faster on sequences > 256 aa
)
embs = embedder.embed(seqs, batch_size=32)| Script | Purpose |
|---|---|
scripts/embed_sequences.py |
Embed one FASTA or CSV (any model, any layers) |
scripts/embed_batch.py |
Multi-protein × multi-model batch runner (local GPU, resume-aware) |
scripts/verify_embeddings.py |
Check output shapes, NaN/Inf, completeness |
slurm/embed_array.sbatch |
SLURM array job (one task per protein) |
slurm/submit_all_models.sh |
Submit one array per model to SLURM |
For large-scale runs with many proteins and multiple models:
# proteins.csv must have 'id' and 'filename' columns
uv run python scripts/embed_batch.py \
--proteins proteins.csv \
--dms-dir data/sequences/ \
--output-dir embeddings/ \
--mode ablation # or 'primary' for one layer per model
# Single model, resume from idx 60
uv run python scripts/embed_batch.py \
--proteins proteins.csv \
--model esm2_650M \
--start-idx 60
# Dry-run: see plan without embedding
uv run python scripts/embed_batch.py --proteins proteins.csv --dry-runThe batch runner:
- Loads each model once and streams all proteins through it (vs SLURM, which re-loads per task)
- Auto-detects already-done
.npyfiles and skips them (full resume) - Prefetches tokenisation in a background thread while the GPU runs the forward pass
- Halves batch size and retries on OOM
Note
The SLURM script is a template based on Georgia Tech's PACE cluster. You may need to modify resource requests, array indexing, and module loading for your HPC environment.
# Ablation run for all 6 models
MODE=ablation bash slurm/submit_all_models.sh
# Monitor
squeue -u $USEREdit slurm/embed_array.sbatch to set #SBATCH --array=0-N to match your protein count.
All embeddings are saved as .npy files:
{protein_id}_{model}_layer{k}.npy → shape (N, D) dtype float32
where N is the number of sequences and D is the model embedding dimension.
Example directory:
embeddings/
GFP_AVIC_esm2_650M_layer9.npy (3809, 1280)
GFP_AVIC_esm2_650M_layer18.npy (3809, 1280)
GFP_AVIC_esm2_650M_layer33.npy (3809, 1280)
Load with NumPy:
import numpy as np
embs = np.load("embeddings/GFP_AVIC_esm2_650M_layer33.npy")
print(embs.shape) # (3809, 1280)Layer indices extracted when --layers is not specified.
Corresponds to approximately 25%, 50%, 75%, and 100% of model depth, following Valeriani et al. (NeurIPS 2023).
| Model | Ablation layers |
|---|---|
esm2_8M |
2, 4, 5, 6 |
esm2_35M |
4, 8, 10, 12 |
esm2_150M |
8, 16, 24, 30 |
esm2_650M |
9, 18, 27, 30, 33 |
esmc_300m |
7, 14, 21, 29 |
esmc_600m |
8, 17, 26, 35 |
| Optimisation | Speedup | How to enable |
|---|---|---|
| bfloat16 weights | ~21% faster, half VRAM | half=True or --half |
| SDPA / Flash Attention 2 | 2–4× on seqs > 256 aa | use_sdpa=True or --sdpa |
| Tokenisation prefetching | hides CPU bottleneck on small models | automatic in embed_batch.py |
torch.compile() |
~20–30% extra (after warm-up) | compile=True in batch runner |
Recommended for most CUDA setups:
half=True+use_sdpa=True.
torch.compile()is disabled by default because SDPA already provides the dominant speedup, and enabling both causes a 6+ min warm-up.
ESM-C embeddings are extracted by calling ESMC.forward() directly with a (B, L) token tensor,
bypassing the EvolutionaryScale SDK's encode() + logits() wrapper which forces batch size 1.
This gives ~2.4× speedup at batch size 8 on sequences of ~500 residues.
Recommended batch sizes for 8 GB VRAM: esmc_300m: 8 | esmc_600m: 8
All embeddings are mean-pooled over residue positions, excluding the BOS (position 0) and EOS tokens, following Valeriani et al. NeurIPS 2023. Pooling is fully vectorised on the GPU (no Python loop per sequence).
uv run pytest tests/ -vTests use esm2_8M (32 MB) on CPU to keep CI fast. The suite checks:
- Output shapes and dtype
- No NaN / Inf values
- Multi-layer key consistency
- Single-layer vs multi-layer numerical agreement
Note
These tests are not representative of a full-scale testing suite. We recommend adding test cases tailored to your specific use case.
If you use this code in your research, please cite the relevant ESM papers.
ESM-2
@ARTICLE{Lin2023-tw,
title = "Evolutionary-scale prediction of atomic-level protein structure
with a language model",
author = "Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and
Zhu, Zhongkai and Lu, Wenting and Smetanin, Nikita and Verkuil,
Robert and Kabeli, Ori and Shmueli, Yaniv and Dos Santos Costa,
Allan and Fazel-Zarandi, Maryam and Sercu, Tom and Candido,
Salvatore and Rives, Alexander",
journal = "Science",
publisher = "American Association for the Advancement of Science (AAAS)",
volume = 379,
number = 6637,
pages = "1123--1130",
month = mar,
year = 2023,
language = "en"
}ESM-C
@misc{candido2026language,
title = {Language Modeling Materializes a World Model of Protein Biology},
author = {Candido, Salvatore and Hayes, Thomas and Derry, Alexander and Rao, Roshan
and Lin, Zeming and Verkuil, Robert and Wu, Bryan and Lee, Jin Sub
and Bruguera, Elise S. and Keval, Jehan A. and Kopylov, Mykhailo
and Pak, John E. and Wu, Wesley and Thomas, Neil and Mataraso, Samson
and Hsu, Alvin and Trotman-Grant, Ashton C. and Fatras, Kilian
and dos Santos Costa, Allan and Badkundri, Rohil and Ak{\i}n, Halil
and Oktay, Deniz and Deaton, Jonathan and Montabana, Elizabeth
and Sitwala, Hrishita and Yu, Yue and Wiggert, Marius
and Carlin, Dylan Alexander and Goering, Anthony W. and Blazejewski, Tomasz
and Sandora, McCullen and Hla, Michael and Jia, Tina Z.
and Kloker, Leon H. and Sofroniew, Nicholas J. and Uehara, Masatoshi
and Pannu, Jassi and Bachas, Sharrol and Liu, Daniel S.
and Sercu, Tom and Rives, Alexander},
year = {2026},
url = {https://biohub.ai/papers/esm_protein.pdf},
note = {Preprint}
}ESM-C weights
@software{evolutionaryscale_2024,
author = {{EvolutionaryScale Team}},
title = {evolutionaryscale/esm},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.14219303},
URL = {https://doi.org/10.5281/zenodo.14219303}
}Pooling convention
@INPROCEEDINGS{Valeriani2023-tr,
title = "The geometry of hidden representations of large transformer
models",
author = "Valeriani, Lucrezia and Doimo, Diego and Cuturello,
Francesca and Laio, Alessandro and Ansuini, Alessio and
Cazzaniga, Alberto",
month = feb,
year = 2023,
copyright = "http://creativecommons.org/licenses/by/4.0/",
archivePrefix = "arXiv",
primaryClass = "cs.LG",
eprint = "2302.00294"
}Distributed under the MIT License.