PLMGuard is a diagnostic framework for protein sequence search that probes whether similarity scores are biologically meaningful, semantically coherent, and resistant to manipulation. It helps distinguish trustworthy search signals from opaque embedding-based similarity across six complementary experiments.
Download the required assets (links below) and arrange the project as follows:
PLMGuard/
├── data/
│ ├── db/ # Reference databases
│ │ ├── astral.fa # ASTRAL sequence database
│ │ └── astral_pdb/ # ASTRAL PDB structures
│ │ └── astral40/pdbstyle-2.08/
│ ├── parsed_result/ # Parsed search results
│ └── rosetta_mut/ # Rosetta-relaxed mutant PDB structures (Exp. 2)
├── libs/ # Third-party tool source code
├── results/ # Output figures and metrics (auto-created)
└── src/ # Source code (this repository)
- Reference databses, Library dependencies (PLM methods), Rosetta-mutated structures — download from zenodo
- PDB structures - download from pdbstyle-2.08
- Parsed search results - see Reproduce Figures from the Paper section
Create the base evaluation environment:
conda create -n PLMGuard python=3.10
conda activate PLMGuard
pip install -r requirements.txtEach PLM-based search tool requires its own environment. Follow the per-tool instructions in the linked library README. The tool-specific Python executables are set via *_PYTHON_PATH variables in .env.
Additionally, build TMscore and place the binary at libs/TMscore:
wget https://zhanggroup.org/TM-score/TMscore.cpp -O libs/TMscore.cpp
g++ -static -O3 -ffast-math -lm -o libs/TMscore libs/TMscore.cppSteps 3 and 4 are only required for running the protein similarity search methods. If you do not intend to use these methods, you may safely skip these steps.
Edit src/PLMs_cmds/.env and set BASE_DIR to your PLMGuard root:
# src/PLMs_cmds/.env
export BASE_DIR="/path/to/PLMGuard" # <-- update this line
export DATA_DIR="$BASE_DIR/data"
export TEMP_DIR="$DATA_DIR/temp"The remaining variables in .env (Python paths, GPU assignments, library paths) should be updated to match your installation of each tool.
PLM-based methods — download our modified versions from zenodo, place in libs/, and follow the per-tool environment setup:
| Method | Environment variable |
|---|---|
| DCTdomain | DCT_DOMAIN_PYTHON_PATH, DCT_DOMAIN_SRC_DIR |
| DHR | DHR_PYTHON_PATH, DHR_SRC_DIR |
| PLMSearch | PLMSEARCH_PYTHON_PATH, PLMSEARCH_SRC_DIR |
| TM-Vec | TMVEC_PYTHON_PATH, TMVEC_SRC_DIR |
Alignment-based methods — install from official sources at the versions used in the paper:
| Method | Version | Source |
|---|---|---|
| BLASTp | 2.17.0 | https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ |
| MMseqs2 | 18.8cc5cs | https://github.com/soedinglab/MMseqs2 |
| Diamond | 2.1.14 | https://github.com/bbuchfink/diamond |
| NEAR | — | https://github.com/TravisWheelerLab/NEAR |
Since running all search methods is time-consuming, we provide precomputed parsed search results so you can reproduce all figures directly.
- Download the precomputed parsed results from the PLMGuard dataset on Hugging Face and place the contents in
data/ - Run from the
src/directory:
cd src
python sanity_check/check_bio_evolution.py # Experiment 1
python sanity_check/check_bio_mut_structure.py # Experiment 2
python sanity_check/check_pert_doublen.py # Experiment 3
python sanity_check/check_pert_truncation.py # Experiment 4
python sanity_check/check_perm_data.py # Experiment 5
python sanity_check/check_perm_model.py # Experiment 6
python sanity_check/overall_performance.py # Overall summary bubble plotAll figures and metrics are saved under results/.
Note: This repository is designed to test the methods evaluated in the paper. A standalone tool for testing custom search methods is coming soon.
Generate all FASTA variants required by the six experiments from the base astral.fa. Run from src/:
cd src
# Evolutionary mutants — Experiment 1 (produces astral_mutantWAG.fa)
python utils/fasta_utils.py --fasta_url ../data/db/astral.fa --type mutant
# Doubled sequences — Experiment 3 (produces astral_doubshuf.fa and astral_doubself.fa)
python utils/fasta_utils.py --fasta_url ../data/db/astral.fa --type doublen
# Truncated sequences — Experiment 4 (produces astral_trunchalf.fa and astral_truncqrt.fa)
python utils/fasta_utils.py --fasta_url ../data/db/astral.fa --type trunclen
# Shuffled database — Experiment 5
# Step 1: generate shuffled sequences (produces astral_shuf.fa)
python utils/fasta_utils.py --fasta_url ../data/db/astral.fa --type shuf
# Step 2: concatenate with originals to produce the decoy database used in Exp. 5
cat ../data/db/astral_shuf.fa ../data/db/astral.fa > ../data/db/astral_shuf_ori.fa
# Convert all FASTA files to TSV (required by DHR)
for fa in ../data/db/astral*.fa; do
python utils/fasta_utils.py --fasta_url "$fa" --type to_tsv
doneThis produces files in data/db/: astral_mutantWAG.fa, astral_doubshuf.fa, astral_doubself.fa, astral_trunchalf.fa, astral_truncqrt.fa, astral_shuf.fa, and corresponding .tsv files.
All run_*.sh scripts accept these flags:
| Flag | Description |
|---|---|
-q |
Query file (.fa for most methods; .tsv for DHR) |
-t |
Target database file (.fa for most methods; .tsv for DHR) |
-k |
DB_KHITS: number of hits the search tool returns |
-p |
PARSER_KHITS: number of hits the parser keeps; determines the _hitN suffix in parsed result filenames |
Since DCTdomain divides each target sequence into multiple segments, the effective number of targets increases. Therefore, the -k parameter should be scaled accordingly, i.e., -k = 3 × -p.
Per-experiment -k/-p values (driven by target database size and analysis requirements):
| Experiment | Methods | -p |
-k (DCTdomain) |
-k (others) |
|---|---|---|---|---|
| 1 — Evolutionary Plausibility | all 8 | 1000 | 3000 | 1000 |
| 2 — Structure Consistency | all 8 | 15177 | 45531 | 15177 |
| 3 — Redundancy Stability | all 8 | 1000 | 3000 | 1000 |
| 4 — Similarity Monotonicity | all 8 | 1000 | 3000 | 1000 |
| 5 — Decoy Sensitivity | blastp, near, plm, tmvec, dctdomain, dhr | 30354 | 91062 | 30354 |
| 6 — Representation Reliability (perm only) | dctdomain, dhr, plm, tmvec | 15177 | 45531 | 15177 |
Tests whether similarity scores correctly track evolutionary distance across successive stages of amino-acid substitution.
Run search methods (all 8 methods; query = original or mutant, target = original database):
cd src
source PLMs_cmds/.env
# Run for each query in {astral.fa, astral_mutantWAG.fa}; target: astral.fa
# -k/-p: see table above (dctdomain: -k 3000 -p 1000; all others: -k 1000 -p 1000)
bash PLMs_cmds/run_blastp.sh -q ../data/db/astral.fa -t ../data/db/astral.fa -k 1000 -p 1000
# ... repeat for diamond, mmseq2, near, plm, tmvec, dctdomain
bash PLMs_cmds/run_dhr.sh -q ../data/db/astral.tsv -t ../data/db/astral.tsv -k 1000 -p 1000Run sanity check:
python sanity_check/check_bio_evolution.pyOutput saved to results/check_bio_evolution/.
Tests whether similarity scores reflect the structural relationships of mutant proteins, a task that is especially challenging in the remote-homology regime where sequence identity provides limited guidance.
Step 1 — Generate Rosetta mutants:
cd src
python utils/run_mut_rosetta.pyThis samples 1000 sequences from data/db/astral.fa, runs Rosetta mutation and side-chain relaxation for each, and writes the mutated PDB files to data/rosetta_mut/.
Since structure mutation is extremely time-costing, we won't suggest running this script, instead, using our generated data under data/rosetta_mut.
Step 2 — Filter mutants and extract origin sequences:
python utils/filter_mut_rosetta.py \
--data-dir ../data/rosetta_mutThis scores each model with PyRosetta (CA-RMSD, fa_rep, backbone torsion, H-bonds) and produces:
data/rosetta_mut/astral_mutrosetta.fa— quality-filtered mutant sequences (CA-RMSD < 3 Å)data/rosetta_mut/astral_originrosetta.fa— corresponding original sequences extracted fromastral.fa
Copy both FASTA files to the database directory and convert to TSV for DHR:
cp ../data/rosetta_mut/astral_mutrosetta.fa ../data/db/
cp ../data/rosetta_mut/astral_originrosetta.fa ../data/db/
python utils/fasta_utils.py --fasta_url ../data/db/astral_mutrosetta.fa --type to_tsv
python utils/fasta_utils.py --fasta_url ../data/db/astral_originrosetta.fa --type to_tsvStep 3 — Run search methods:
source PLMs_cmds/.env
# Run for each query in {astral_originrosetta.fa, astral_mutrosetta.fa}; target: astral.fa
# -k/-p: see table above (dctdomain: -k 45531 -p 15177; all others: -k 15177 -p 15177)
bash PLMs_cmds/run_blastp.sh -q ../data/db/astral_originrosetta.fa -t ../data/db/astral.fa -k 15177 -p 15177
# ... repeat for diamond, mmseq2, near, plm, tmvec, dctdomain
bash PLMs_cmds/run_dhr.sh -q ../data/db/astral_originrosetta.tsv -t ../data/db/astral.tsv -k 15177 -p 15177Step 4 — Run sanity check:
python sanity_check/check_bio_mut_structure.pyOutput saved to results/check_bio_structure/.
Probes whether similarity scores degrade gracefully when query sequences are artificially doubled (self-concatenated or shuffle-concatenated).
Run search methods:
source PLMs_cmds/.env
# Run for each query in {astral.fa, astral_doubshuf.fa, astral_doubself.fa}; target: astral.fa
# -k/-p: see table above (dctdomain: -k 3000 -p 1000; all others: -k 1000 -p 1000)
bash PLMs_cmds/run_blastp.sh -q ../data/db/astral.fa -t ../data/db/astral.fa -k 1000 -p 1000
# ... repeat for diamond, mmseq2, near, plm, tmvec, dctdomain
bash PLMs_cmds/run_dhr.sh -q ../data/db/astral.tsv -t ../data/db/astral.tsv -k 1000 -p 1000Run sanity check:
python sanity_check/check_pert_doublen.pyOutput saved to results/check_pert_doublen/.
Tests whether similarity scores decrease monotonically as query sequences are progressively truncated.
Run search methods:
source PLMs_cmds/.env
# Run for each query in {astral.fa, astral_trunchalf.fa, astral_truncqrt.fa}; target: astral.fa
# -k/-p: see table above (dctdomain: -k 3000 -p 1000; all others: -k 1000 -p 1000)
bash PLMs_cmds/run_blastp.sh -q ../data/db/astral.fa -t ../data/db/astral.fa -k 1000 -p 1000
# ... repeat for diamond, mmseq2, near, plm, tmvec, dctdomain
bash PLMs_cmds/run_dhr.sh -q ../data/db/astral.tsv -t ../data/db/astral.tsv -k 1000 -p 1000Run sanity check:
python sanity_check/check_pert_truncation.pyOutput saved to results/check_pert_truncation/.
Evaluates whether methods can distinguish true homologs from permuted-data decoys (sequences with destroyed biological signal).
Run search against a combined database of standard and shuffled sequences:
source PLMs_cmds/.env
# Query: astral.fa | Target: astral_shuf_ori.fa
# -k/-p: see table above (dctdomain: -k 91062 -p 30354; all others: -k 30354 -p 30354)
bash PLMs_cmds/run_blastp.sh -q ../data/db/astral.fa -t ../data/db/astral_shuf_ori.fa -k 30354 -p 30354
# ... repeat for near, plm, tmvec, dctdomain
bash PLMs_cmds/run_dhr.sh -q ../data/db/astral.tsv -t ../data/db/astral_shuf_ori.tsv -k 30354 -p 30354Run sanity check:
python sanity_check/check_perm_data.pyOutput saved to results/check_perm_data/.
Tests whether PLM-based methods produce consistent representations by comparing standard embeddings against permuted-model variants.
Run standard search against database (baseline):
We directly reuse the search results obtained from Experiment 5.
Run permuted-model search:
# Query: astral.fa | Target: astral.fa
# -k/-p: see table above (dctdomain: -k 45531 -p 15177; all others: -k 15177 -p 15177)
bash PLMs_cmds/run_perm_plm.sh -q ../data/db/astral.fa -t ../data/db/astral.fa -k 15177 -p 15177
# ... repeat for tmvec, dctdomain
bash PLMs_cmds/run_perm_dhr.sh -q ../data/db/astral.tsv -t ../data/db/astral.tsv -k 15177 -p 15177Run sanity check:
python sanity_check/check_perm_model.pyOutput saved to results/check_perm_model/.
After running all six experiments, generate the combined bubble plot:
python sanity_check/overall_performance.pyOutput saved to results/overall_performance/overall_performance_bubble.pdf.
