foldconsensus

Physics-gated cross-model ranker for commercially-redistributable protein co-folding outputs.

Modern co-folding models (Boltz-2, Chai-1, Protenix, OpenFold3, …) each emit their own confidence scores in their own scale. A high model confidence does not guarantee a physically valid pose — a structure can score well yet contain steric clashes or impossible bond geometry.

foldconsensus is a small, dependency-light harness that:

Normalizes raw confidence/affinity from each backend into a common scale (one module owns every sign and unit convention, so a backend quirk cannot leak silently into the ranking).
Calibrates per-model confidence with a self-contained isotonic / Platt layer (numpy + scipy only — no torch in the core), and combines models with a uniform-weighted ensemble by default; an inverse-error (inverse-Brier) weighting is available and is adopted only when it beats the uniform baseline on held-out data (otherwise uniform is kept).
Physics-gates every candidate with PoseBusters: a pose that fails a hard physical check is multiplicatively zeroed (hard_valid = False ⇒ Q = 0) rather than merely down-weighted.
Ranks the surviving candidates by the gated score Q(x).

Architecture

What this is (and is not)

v0.1.0a3 CLAIM — the PoseBusters physics gate. foldconsensus demonstrates, on real crystal-structure ligand poses (PoseBusters-benchmark PDB entries), that a high-confidence pose which fails a hard physical check is multiplicatively zeroed and removed from the top of the ranking. The gate is backend-agnostic; running it on a live Boltz-2 cofolder needs a GPU and is deferred to v0.1.1.

v0.1.0a3 NOT a claim — calibration accuracy. The bundled calibration is validated for algorithmic correctness only (monotonicity, ECE→0 under perfect calibration, NaN-safety). Correcting the real miscalibration of a specific cofolder is deferred to v0.1.1 and is reported, if measured on synthetic data, with an explicit algorithm-validation-only disclaimer.

foldconsensus does not bundle any model weights. It is a ranking layer that runs on top of backends you install yourself. AlphaFold 3 is deliberately not a backend (its weights are non-commercial); every bundled backend is commercially redistributable. See NOTICE.

Relation to prior work

ABCFold runs several folding models and visualizes their raw outputs side by side. foldconsensus differs in the ranking logic: cross-model calibration plus a multiplicative physics gate, rather than a side-by-side display of raw scores.

Install

pip install foldconsensus            # core: numpy + scipy only
pip install "foldconsensus[physics]" # + rdkit + posebusters (physics gate)
pip install "foldconsensus[boltz]"   # + boltz + torch (Boltz-2 backend)

Quickstart

foldconsensus doctor                    # backend availability (live vs mock), honest
foldconsensus rank -i candidates.json   # rank candidate poses with the physics gate
foldconsensus calibrate -i scores.json  # fit / report a calibration curve

Input notes for rank:

The input schema (with a runnable example) is examples/candidates.json.
physics_checks values must be booleans (pass/fail); a non-boolean is rejected rather than silently treated as a pass — raw PoseBusters numeric columns must be reduced to booleans first (the [physics] adapter does this for you).
A candidate with no physics_checks is not gated (treated as valid): the gate only removes poses with positive evidence of invalidity.
The optional affinity field is collected but not yet used in v0.1.0a3 ranking (deferred to v0.2).

How it works

The core ranking formula is:

Q(x) = hard_valid(x) * [ α * p_ensemble + β * (1 - u) + γ * pb_pass_rate ]

where α = β = γ = 1/3 (fixed, untuned in this release; weight learning is deferred to v0.2).

Term	Meaning
`p_ensemble`	Calibrated ensemble probability across all available backends
`u`	Epistemic uncertainty; `(1 - u)` rewards agreement across models
`pb_pass_rate`	Fraction of PoseBusters checks passed (soft heuristics included in rate but not in hard gate)
`hard_valid`	Multiplicative gate: if any hard physical check fails, `Q` is forced to exactly 0

Soft checks (e.g. internal_energy, non-aromatic_ring_non-flatness) count toward pb_pass_rate but do not alone trigger the hard gate. Every other PoseBusters boolean check is treated as hard.

Results (v0.1.0a1)

All numbers are generated by scripts/measure.py into bench_results/*.json (env-stamped, deterministic) and are never hand-edited. The v0.1.0a1 filenames are intentional: v0.1.0a2/a3 were packaging/CI-only patches with no metric or logic change.

Physics gate (the claim) — real data. On the real PoseBusters-benchmark PDB ligands (1s3v, 1of6, 1uou, 1ia1, including one protein–ligand mol_cond run):

All 5 real poses pass the hard gate.
A controlled atomic perturbation of the real 1uou ligand (bond-length/angle violation) is zeroed to Q = 0 despite a 0.97 model confidence — a high-confidence but physically invalid pose is removed, not merely down-weighted.

Source: bench_results/gate_v0.1.0a1.json. Live cofolder (Boltz-2) inference needs a GPU and is deferred to v0.1.1; the gate is backend-agnostic, so real ligand poses stand in for "a confident pose that violates physics".

Calibration — algorithm validation only (synthetic). On synthetic miscalibrated data, the isotonic calibrator:

Reduces ECE 0.159 → 0.022 (held-out test split; bootstrap 95% CI [0.018, 0.046])
Reduces Brier 0.199 → 0.170
Preserves ranking (Spearman rho ~ 0.998)

This is a correctness check of the calibration algorithm, not a claim about correcting any real cofolder's miscalibration (deferred to v0.1.1). Source: bench_results/calibration_v0.1.0a1.json.

Status

v0.1.0a3 — pre-alpha.

Implemented backend: Boltz-2 (adapter; the live cofolder needs the [boltz] extra + a GPU).
Chai-1 is v0.1.1; Protenix / OpenFold3 are stubs (NotImplementedError, never placeholder values).
A structural-agreement ranking axis (pairwise TM-score / RMSD median across the top poses) is not yet implemented and is deferred to v0.1.1; it is not part of the gated score Q.

Changelog

v0.1.0a3 — packaging/CI hygiene (no metric or logic change; the v0.1.0a1 measurement artifacts are unchanged). Bounded requires-python to <3.14 so the [boltz] extra's transitive dependency pins resolve (fixes a dependency-resolver failure on the 3.14 split); bumped CI actions to the Node 24 runtime (actions/checkout@v6, astral-sh/setup-uv@v7).
v0.1.0a2 — hardening patch (no metric changes; the v0.1.0a1 measurement artifacts are unchanged). The physics gate now fails closed on non-boolean physics_checks input (a raw numeric clash distance can no longer be mistaken for a pass via the CLI); added an input-schema example, cleaner CLI errors, and bench_results/examples in the sdist.
v0.1.0a1 — initial pre-alpha: calibration core, Boltz-2 adapter, PoseBusters physics gate.

Audit-trail integration (memcanon)

memcanon v0.2+ accepts events from this repo via a thin in-process shim and content-hashes them into a local audit store:

memcanon is not on PyPI yet. Install it from the tagged release:
pip install "git+https://github.com/hinanohart/memcanon@v0.2.0a2"

from memcanon.emit import emit
from memcanon.store.local import LocalStore

with LocalStore("audit") as store:
    emit("foldconsensus", {"kind": "...", "decision": "..."}, store=store)

Each record is tagged source:foldconsensus + schema:memcanon-emit/1. Memcanon's memcanon export --format eu-ai-act-12 --to OUT.json can then build an Article 12(2) paragraph-mapped audit-log artefact (SHAPE only, NOT a conformity assessment).

License

MIT. See LICENSE for third-party components.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
bench_results		bench_results
docs		docs
examples		examples
scripts		scripts
src/foldconsensus		src/foldconsensus
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

foldconsensus

Architecture

What this is (and is not)

Relation to prior work

Install

Quickstart

How it works

Results (v0.1.0a1)

Status

Changelog

Audit-trail integration (memcanon)

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

foldconsensus

Architecture

What this is (and is not)

Relation to prior work

Install

Quickstart

How it works

Results (v0.1.0a1)

Status

Changelog

Audit-trail integration (memcanon)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages