Issue 42: add Boltz tracing + SLURM runner + docs by ArishV1 · Pull Request #45 · AI2Science/vizfold-foundation

ArishV1 · 2026-02-22T00:11:27Z

Context

Delivers end-to-end Boltz-2 inference + tracing for Issue #42 (VizFold-style attention traces and arc diagrams).
Prioritizes reproducibility on PACE ICE / Slurm GPU (scratch env, deterministic runner, stable OUT_RUN naming, consistent layout, and automated validation—including strict checks aligned with what the Slurm runner enforces).

New Functionality

Tracing injection (no Boltz fork)

Tracing remains via boltz_trace/sitecustomize.py, active when BOLTZ_TRACE_DIR is set (runner puts boltz_trace first on PYTHONPATH).
Exports attention-style traces in VizFold’s text format:
- msa_row_attn_layer{L}.txt
- triangle_start_attn_layer{L}_residue_idx_{r}.txt
- triangle_end_attn_layer{L}_residue_idx_{r}.txt
Supports BOLTZ_TRACE_LAYERS, BOLTZ_TRACE_RESIDUES, BOLTZ_TRACE_TOPK, BOLTZ_TRACE_HEAD=all|<idx> (runner maps BOLTZ_TRACE_HEAD → internal TRACE_HEAD; default all).

Proxy-head expansion for format compatibility

scripts/boltz/expand_proxy_heads.py: when traces are effectively single-head in proxy mode, duplicates blocks so the arc pipeline can target a consistent multi-head layout (default 4 heads in the runner).
Explicitly documented and logged as a format shim (not independent per-head attention in proxy mode); exit status wired for scripting.

Optional activation dumps

act_npz/pairformer_boltz/*.npz when BOLTZ_ACT_DIR is set (runner sets it): validates expected keys/shapes under --strict (pair_norm, pair_slice).

Offline format + layout regression (CPU, no GPU)

scripts/boltz/check_trace_format_fixtures.py + committed scripts/boltz/fixtures/*.txt for trace text format checks.
tests/test_boltz_trace_validate.py: builds a minimal OUT_RUN and runs validate_boltz_traces.py --strict, plus a negative test for manifest.json run_dir mismatch.

Runner / Reproducibility

Environment build (scratch, GPU-ready, avoids conflicts)

environment_boltz.yml, scripts/boltz/boltz_pip_extras.txt, scripts/boltz/setup_boltz_env.sh:
- conda env create/update on $SCR, pip extras + pinned boltz, smoke checks, boltz CLI verification.
- Header hints emphasize fresh prefix vs in-place update when debugging RDKit/Boltz import issues (see Boltz_Inference.md).

Slurm runner

scripts/boltz/run_boltz_trace.sbatch, scripts/boltz/run_boltz_trace.sh
Runner behavior:
- default BOLTZ_ENV=$SCR/conda/envs/boltz_clean (override with BOLTZ_ENV / sbatch --export=ALL).
- RUN_ID includes ${SLURM_JOB_ID:-$$} to avoid collisions when multiple jobs start in the same second.
- outputs pred/, attn_txt/, arc_png/, act_npz/, manifest.json; scratch caches (HF_HOME, TORCH_HOME, etc.).
- boltz predict --no_kernels for ICE GPU compatibility (H100 and other allocated GPU types per site #SBATCH).
- copies components/pairformer_boltz/attn_txt/*.txt to attn_txt/ root for legacy plot/validate consumers.
- runs expand_proxy_heads.py when TRACE_HEAD=all (with explicit log line about duplicated proxy heads).
- ends with validate_boltz_traces.py --strict (fail the job if validation fails).

Manifest

manifest.json: run_dir, repo path + short git sha, inputs, trace knobs, output subdirs, cache + Boltz flags — validated under --strict against the actual filesystem tree.

Validation + Visualization

Validation (`scripts/boltz/validate_boltz_traces.py`)

Core checks (always): required trace files, header/edge parsing, per-file head counts, heuristic arc_png count vs heads/layers/residues.
--strict additionally requires:
- manifest.json present with required keys and realpath-aligned run_dir / outputs.* vs --run_dir
- non-empty pred/ file tree
- well-formed attn_txt/component_status.json (msa, pairformer_boltz, sm_boltz entries)
- if act_npz/pairformer_boltz/*.npz exist, NumPy shape/key sanity checks

Arc diagram generation

Same VizFold utilities as before; PNG naming along the lines of:
- arc_png/msa_row_head_*_layer_*_BOLTZ_arc.png
- arc_png/tri_start_res_*_...
- arc_png/tri_end_res_*_...

CI + repo hygiene

.github/workflows/undefined_names.yml: new boltz_trace_format job (fixture checker + unittest tests.test_boltz_trace_validate on Python 3.11); existing flake8 job unchanged.
.gitignore: ignore large run artifacts; un-ignore scripts/boltz/fixtures/*.txt and submission/issue42-demonstration-evidence/screenshots/*.png (root *.png is otherwise ignored).

Documentation

docs/source/Boltz_Inference.md:
- Issue Extend VizFold Inference and Visualization to Boltz #42 scope table (env, traces, activations, structures, metadata, validation, CI).
- Quickstart framed as PACE ICE / Slurm GPU (not H100-only); partition/GRES edit note.
- Stronger clean-prefix / scratch guidance; RDKit sanity check; proxy vs true attention; expand_proxy_heads semantics.
- Reviewer pack pointer: submission/issue42-demonstration-evidence/REPRODUCIBILITY.md (commands + embedded screenshots).
README.md: links Boltz doc + submission REPRODUCIBILITY.md; fixes “Openfold” to “Openfold implementation” spelling on the OpenFold line.

How to test

CPU (matches CI)

cd "$(git rev-parse --show-toplevel)"
python3 scripts/boltz/check_trace_format_fixtures.py
python3 -m unittest tests.test_boltz_trace_validate -v
# optional local parity with Actions’ syntax scan:
python3 -m venv /path/on/scratch/flake8-venv && . /path/on/scratch/flake8-venv/bin/activate
pip install flake8
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
deactivate

ICE / Slurm (full pipeline)

cd "$(git rev-parse --show-toplevel)"
export SCR=/storage/ice1/2/0/$USER

export ENV="$SCR/conda/envs/boltz_clean_fresh"
rm -rf "$ENV"   # first-time / troubleshooting
bash scripts/boltz/setup_boltz_env.sh

if ! command -v module >/dev/null 2>&1; then
  [ -f /etc/profile.d/modules.sh ] && source /etc/profile.d/modules.sh
  [ -f /usr/share/Modules/init/bash ] && source /usr/share/Modules/init/bash
fi
module load anaconda3 >/dev/null 2>&1 || true
source "$(conda info --base)/etc/profile.d/conda.sh"
conda activate "$ENV"

python - <<'PY'
from rdkit import rdBase
from rdkit.Chem import Mol
print("RDKit OK:", rdBase.rdkitVersion)
PY

export BOLTZ_ENV="$ENV"
mkdir -p outputs/boltz_runs
JOBID=$(sbatch --export=ALL scripts/boltz/run_boltz_trace.sbatch | awk '{print $4}')
echo "JOBID=$JOBID"

Wait for COMPLETED, then:

sacct -j "$JOBID" --format=JobID,State,ExitCode,Elapsed -n
tail -n 100 "outputs/boltz_runs/slurm-${JOBID}.out"

OUT_RUN=$(grep -m1 '^\[INFO\] OUT_RUN=' "outputs/boltz_runs/slurm-${JOBID}.out" | sed 's/.*OUT_RUN=//')
echo "OUT_RUN=$OUT_RUN"
[ -n "$OUT_RUN" ] || { echo "[ERROR] OUT_RUN is empty." >&2; exit 2; }

find "$OUT_RUN/attn_txt" -name "*.txt" | wc -l
find "$OUT_RUN/act_npz"  -name "*.npz" | wc -l
find "$OUT_RUN/arc_png"  -name "*.png" | wc -l

python3 scripts/boltz/validate_boltz_traces.py \
  --run_dir "$OUT_RUN" --layers 0 --residues 18 --strict

Expected: Slurm stdout ends with [PASS] validate_boltz_traces.py; strict re-run exits 0 with [OK] manifest.json ... paths match --run_dir and no [FAIL].

Elliptic461 · 2026-03-20T02:32:13Z

Arish, I am getting this error.
It seems like it isn't producing anything to Attn.txt

…orting

jayvenn21 · 2026-04-24T16:39:34Z

hey team! this is Jayanth, the team lead from the apache airavata esmfold team and i wanted to touch base since the semester is wrapping up. On the ESMFold side we're basically done as all intermediate extraction (attention, activations, trunk, structure module), validation tests, docs, and a frontend dashboard are merged on our fork. Suresh mentioned in our last meeting that he wants both ESMFold and Boltz work merged into the main vizfold repo before the semester ends.
a few questions on your end:

where are you at with Boltz extraction? Last I heard you were finishing the last of the three sections. is that done or close?
are your traces writing to the same archive format (structure/predicted.pdb, trace/attention/, trace/activations/, meta.json, etc.)? That'll make the merge a lot smoother if both backends produce the same layout.
do you guys have a branch or fork we should be looking at? Happy to review PRs or help resolve any integration issues on our side.
no rush on a detailed response, just want to make sure we're aligned so the merge doesn't become a last-minute scramble. Let me know if you need help with anything.

ArishV1 · 2026-04-24T16:42:12Z

hey team! this is Jayanth, the team lead from the apache airavata esmfold team and i wanted to touch base since the semester is wrapping up. On the ESMFold side we're basically done as all intermediate extraction (attention, activations, trunk, structure module), validation tests, docs, and a frontend dashboard are merged on our fork. Suresh mentioned in our last meeting that he wants both ESMFold and Boltz work merged into the main vizfold repo before the semester ends. a few questions on your end:

where are you at with Boltz extraction? Last I heard you were finishing the last of the three sections. is that done or close?

are your traces writing to the same archive format (structure/predicted.pdb, trace/attention/, trace/activations/, meta.json, etc.)? That'll make the merge a lot smoother if both backends produce the same layout.

do you guys have a branch or fork we should be looking at? Happy to review PRs or help resolve any integration issues on our side.
no rush on a detailed response, just want to make sure we're aligned so the merge doesn't become a last-minute scramble. Let me know if you need help with anything.

Hi,

1.) Boltz extraction. We’re done / merge-ready on our side: Slurm runs on ICE produce structure outputs under pred/, VizFold-style attention traces under attn_txt/, arc PNGs, optional act_npz/ summaries, manifest.json, plus CPU validation and an extra Actions job so the Boltz path is checked without a GPU. Final step is landing the PR on main.

Same archive layout as ESMFold: not exactly. We use a flat OUT_RUN layout (pred/, attn_txt/, arc_png/, act_npz/, manifest.json) rather than structure/, trace/attention/, meta.json, etc. Same kind of artifacts, different paths/names, we should check with Suresh on whether we need a shared canonical tree or a small adapter before/after merge.
Branch:issue42-boltz-tracing. Here is the link: Issue 42: add Boltz tracing + SLURM runner + docs #45. It is pretty much final except I might make a screen recording for results and push it. Feel free to leave a review.

If any changes need to be made for alignment,
Yin, Kevin and Ramasubramanian, Vishal Subramanian feel free to coordinate with Vennamreddy, Jayanth. I do believe both PRs should not cause any automatic breakage on merge. For conflicts, you can check our PR description, as it is pretty detailed when it comes to summarizing exactly every file we edited. Thank you!

Arish Virani and others added 2 commits February 21, 2026 19:00

Issue 42: add Boltz tracing + SLURM runner + docs

3ecb1ff

Boltz: add trace validator + improve runner/docs

24b4608

Vishram1123 mentioned this pull request Mar 20, 2026

Boltz web runner #57

Open

Arish Virani and others added 4 commits March 20, 2026 12:40

Boltz tracer: robust attention capture + proxy fallback

5f20e7c

Add Boltz-2 tracing pipeline + reproducible ICE env setup

0f2cf00

Add Boltz modular trace layout (pairformer) with component_status rep…

6bc2acb

…orting

strict Boltz trace validation, fixtures, CI, docs, submission evidence

8f5ba9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue 42: add Boltz tracing + SLURM runner + docs#45

Issue 42: add Boltz tracing + SLURM runner + docs#45
ArishV1 wants to merge 6 commits into
AI2Science:mainfrom
ArishV1:issue42-boltz-tracing

ArishV1 commented Feb 22, 2026 •

edited

Loading

Uh oh!

Elliptic461 commented Mar 20, 2026

Uh oh!

jayvenn21 commented Apr 24, 2026

Uh oh!

ArishV1 commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ArishV1 commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

New Functionality

Tracing injection (no Boltz fork)

Proxy-head expansion for format compatibility

Optional activation dumps

Offline format + layout regression (CPU, no GPU)

Runner / Reproducibility

Environment build (scratch, GPU-ready, avoids conflicts)

Slurm runner

Manifest

Validation + Visualization

Validation (scripts/boltz/validate_boltz_traces.py)

Arc diagram generation

CI + repo hygiene

Documentation

How to test

CPU (matches CI)

ICE / Slurm (full pipeline)

Uh oh!

Elliptic461 commented Mar 20, 2026

Uh oh!

jayvenn21 commented Apr 24, 2026

Uh oh!

ArishV1 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArishV1 commented Feb 22, 2026 •

edited

Loading

Validation (`scripts/boltz/validate_boltz_traces.py`)

ArishV1 commented Apr 24, 2026 •

edited

Loading