Physics-gated cross-model ranker for commercially-redistributable protein co-folding outputs.
Modern co-folding models (Boltz-2, Chai-1, Protenix, OpenFold3, …) each emit their own confidence scores in their own scale. A high model confidence does not guarantee a physically valid pose — a structure can score well yet contain steric clashes or impossible bond geometry.
foldconsensus is a small, dependency-light harness that:
- Normalizes raw confidence/affinity from each backend into a common scale (one module owns every sign and unit convention, so a backend quirk cannot leak silently into the ranking).
- Calibrates per-model confidence with a self-contained isotonic / Platt layer (numpy + scipy only — no torch in the core), and combines models with a uniform-weighted ensemble by default; an inverse-error (inverse-Brier) weighting is available and is adopted only when it beats the uniform baseline on held-out data (otherwise uniform is kept).
- Physics-gates every candidate with PoseBusters: a pose that fails a hard physical check is multiplicatively zeroed (
hard_valid = False ⇒ Q = 0) rather than merely down-weighted. - Ranks the surviving candidates by the gated score
Q(x).
v0.1.0a3 CLAIM — the PoseBusters physics gate.
foldconsensus demonstrates, on real crystal-structure ligand poses (PoseBusters-benchmark PDB entries), that a high-confidence pose which fails a hard physical check is multiplicatively zeroed and removed from the top of the ranking. The gate is backend-agnostic; running it on a live Boltz-2 cofolder needs a GPU and is deferred to v0.1.1.
v0.1.0a3 NOT a claim — calibration accuracy.
The bundled calibration is validated for algorithmic correctness only (monotonicity, ECE→0 under perfect calibration, NaN-safety). Correcting the real miscalibration of a specific cofolder is deferred to v0.1.1 and is reported, if measured on synthetic data, with an explicit algorithm-validation-only disclaimer.
foldconsensus does not bundle any model weights. It is a ranking layer that runs on top of backends you install yourself. AlphaFold 3 is deliberately not a backend (its weights are non-commercial); every bundled backend is commercially redistributable. See NOTICE.
ABCFold runs several folding models and visualizes their raw outputs side by side. foldconsensus differs in the ranking logic: cross-model calibration plus a multiplicative physics gate, rather than a side-by-side display of raw scores.
pip install foldconsensus # core: numpy + scipy only
pip install "foldconsensus[physics]" # + rdkit + posebusters (physics gate)
pip install "foldconsensus[boltz]" # + boltz + torch (Boltz-2 backend)foldconsensus doctor # backend availability (live vs mock), honest
foldconsensus rank -i candidates.json # rank candidate poses with the physics gate
foldconsensus calibrate -i scores.json # fit / report a calibration curveInput notes for rank:
- The input schema (with a runnable example) is
examples/candidates.json. physics_checksvalues must be booleans (pass/fail); a non-boolean is rejected rather than silently treated as a pass — raw PoseBusters numeric columns must be reduced to booleans first (the[physics]adapter does this for you).- A candidate with no
physics_checksis not gated (treated as valid): the gate only removes poses with positive evidence of invalidity. - The optional
affinityfield is collected but not yet used in v0.1.0a3 ranking (deferred to v0.2).
The core ranking formula is:
Q(x) = hard_valid(x) * [ α * p_ensemble + β * (1 - u) + γ * pb_pass_rate ]
where α = β = γ = 1/3 (fixed, untuned in this release; weight learning is deferred to v0.2).
| Term | Meaning |
|---|---|
p_ensemble |
Calibrated ensemble probability across all available backends |
u |
Epistemic uncertainty; (1 - u) rewards agreement across models |
pb_pass_rate |
Fraction of PoseBusters checks passed (soft heuristics included in rate but not in hard gate) |
hard_valid |
Multiplicative gate: if any hard physical check fails, Q is forced to exactly 0 |
Soft checks (e.g. internal_energy, non-aromatic_ring_non-flatness) count toward pb_pass_rate but do not alone trigger the hard gate. Every other PoseBusters boolean check is treated as hard.
All numbers are generated by scripts/measure.py into bench_results/*.json (env-stamped, deterministic) and are never hand-edited. The v0.1.0a1 filenames are intentional: v0.1.0a2/a3 were packaging/CI-only patches with no metric or logic change.
Physics gate (the claim) — real data.
On the real PoseBusters-benchmark PDB ligands (1s3v, 1of6, 1uou, 1ia1, including one protein–ligand mol_cond run):
- All 5 real poses pass the hard gate.
- A controlled atomic perturbation of the real
1uouligand (bond-length/angle violation) is zeroed to Q = 0 despite a 0.97 model confidence — a high-confidence but physically invalid pose is removed, not merely down-weighted.
Source: bench_results/gate_v0.1.0a1.json. Live cofolder (Boltz-2) inference needs a GPU and is deferred to v0.1.1; the gate is backend-agnostic, so real ligand poses stand in for "a confident pose that violates physics".
Calibration — algorithm validation only (synthetic). On synthetic miscalibrated data, the isotonic calibrator:
- Reduces ECE 0.159 → 0.022 (held-out test split; bootstrap 95% CI [0.018, 0.046])
- Reduces Brier 0.199 → 0.170
- Preserves ranking (Spearman rho ~ 0.998)
This is a correctness check of the calibration algorithm, not a claim about correcting any real cofolder's miscalibration (deferred to v0.1.1). Source: bench_results/calibration_v0.1.0a1.json.
v0.1.0a3 — pre-alpha.
- Implemented backend: Boltz-2 (adapter; the live cofolder needs the
[boltz]extra + a GPU). - Chai-1 is v0.1.1; Protenix / OpenFold3 are stubs (
NotImplementedError, never placeholder values). - A structural-agreement ranking axis (pairwise TM-score / RMSD median across the top poses) is not yet implemented and is deferred to v0.1.1; it is not part of the gated score
Q.
- v0.1.0a3 — packaging/CI hygiene (no metric or logic change; the
v0.1.0a1measurement artifacts are unchanged). Boundedrequires-pythonto<3.14so the[boltz]extra's transitive dependency pins resolve (fixes a dependency-resolver failure on the 3.14 split); bumped CI actions to the Node 24 runtime (actions/checkout@v6,astral-sh/setup-uv@v7). - v0.1.0a2 — hardening patch (no metric changes; the
v0.1.0a1measurement artifacts are unchanged). The physics gate now fails closed on non-booleanphysics_checksinput (a raw numeric clash distance can no longer be mistaken for a pass via the CLI); added an input-schema example, cleaner CLI errors, andbench_results/examplesin the sdist. - v0.1.0a1 — initial pre-alpha: calibration core, Boltz-2 adapter, PoseBusters physics gate.
memcanon v0.2+ accepts events from this repo via a thin in-process shim and content-hashes them into a local audit store:
memcanon is not on PyPI yet. Install it from the tagged release:
pip install "git+https://github.com/hinanohart/memcanon@v0.2.0a2"
from memcanon.emit import emit
from memcanon.store.local import LocalStore
with LocalStore("audit") as store:
emit("foldconsensus", {"kind": "...", "decision": "..."}, store=store)Each record is tagged source:foldconsensus + schema:memcanon-emit/1. Memcanon's memcanon export --format eu-ai-act-12 --to OUT.json can then build an Article 12(2) paragraph-mapped audit-log artefact (SHAPE only, NOT a conformity assessment).
MIT. See LICENSE for third-party components.
