Issue 42: add Boltz tracing + SLURM runner + docs#45
Conversation
|
hey team! this is Jayanth, the team lead from the apache airavata esmfold team and i wanted to touch base since the semester is wrapping up. On the ESMFold side we're basically done as all intermediate extraction (attention, activations, trunk, structure module), validation tests, docs, and a frontend dashboard are merged on our fork. Suresh mentioned in our last meeting that he wants both ESMFold and Boltz work merged into the main vizfold repo before the semester ends.
|
Hi, 1.) Boltz extraction. We’re done / merge-ready on our side: Slurm runs on ICE produce structure outputs under pred/, VizFold-style attention traces under attn_txt/, arc PNGs, optional act_npz/ summaries, manifest.json, plus CPU validation and an extra Actions job so the Boltz path is checked without a GPU. Final step is landing the PR on main.
If any changes need to be made for alignment, |

Context
OUT_RUNnaming, consistent layout, and automated validation—including strict checks aligned with what the Slurm runner enforces).New Functionality
Tracing injection (no Boltz fork)
boltz_trace/sitecustomize.py, active whenBOLTZ_TRACE_DIRis set (runner putsboltz_tracefirst onPYTHONPATH).msa_row_attn_layer{L}.txttriangle_start_attn_layer{L}_residue_idx_{r}.txttriangle_end_attn_layer{L}_residue_idx_{r}.txtBOLTZ_TRACE_LAYERS,BOLTZ_TRACE_RESIDUES,BOLTZ_TRACE_TOPK,BOLTZ_TRACE_HEAD=all|<idx>(runner mapsBOLTZ_TRACE_HEAD→ internalTRACE_HEAD; defaultall).Proxy-head expansion for format compatibility
scripts/boltz/expand_proxy_heads.py: when traces are effectively single-head in proxy mode, duplicates blocks so the arc pipeline can target a consistent multi-head layout (default 4 heads in the runner).Optional activation dumps
act_npz/pairformer_boltz/*.npzwhenBOLTZ_ACT_DIRis set (runner sets it): validates expected keys/shapes under--strict(pair_norm,pair_slice).Offline format + layout regression (CPU, no GPU)
scripts/boltz/check_trace_format_fixtures.py+ committedscripts/boltz/fixtures/*.txtfor trace text format checks.tests/test_boltz_trace_validate.py: builds a minimalOUT_RUNand runsvalidate_boltz_traces.py --strict, plus a negative test formanifest.jsonrun_dirmismatch.Runner / Reproducibility
Environment build (scratch, GPU-ready, avoids conflicts)
environment_boltz.yml,scripts/boltz/boltz_pip_extras.txt,scripts/boltz/setup_boltz_env.sh:$SCR, pip extras + pinnedboltz, smoke checks,boltzCLI verification.Boltz_Inference.md).Slurm runner
scripts/boltz/run_boltz_trace.sbatch,scripts/boltz/run_boltz_trace.shBOLTZ_ENV=$SCR/conda/envs/boltz_clean(override withBOLTZ_ENV/sbatch --export=ALL).RUN_IDincludes${SLURM_JOB_ID:-$$}to avoid collisions when multiple jobs start in the same second.pred/,attn_txt/,arc_png/,act_npz/,manifest.json; scratch caches (HF_HOME,TORCH_HOME, etc.).boltz predict --no_kernelsfor ICE GPU compatibility (H100 and other allocated GPU types per site#SBATCH).components/pairformer_boltz/attn_txt/*.txttoattn_txt/root for legacy plot/validate consumers.expand_proxy_heads.pywhenTRACE_HEAD=all(with explicit log line about duplicated proxy heads).validate_boltz_traces.py --strict(fail the job if validation fails).Manifest
manifest.json:run_dir, repo path + short git sha, inputs, trace knobs, output subdirs, cache + Boltz flags — validated under--strictagainst the actual filesystem tree.Validation + Visualization
Validation (
scripts/boltz/validate_boltz_traces.py)arc_pngcount vs heads/layers/residues.--strictadditionally requires:manifest.jsonpresent with required keys and realpath-alignedrun_dir/outputs.*vs--run_dirpred/file treeattn_txt/component_status.json(msa,pairformer_boltz,sm_boltzentries)act_npz/pairformer_boltz/*.npzexist, NumPy shape/key sanity checksArc diagram generation
arc_png/msa_row_head_*_layer_*_BOLTZ_arc.pngarc_png/tri_start_res_*_...arc_png/tri_end_res_*_...CI + repo hygiene
.github/workflows/undefined_names.yml: newboltz_trace_formatjob (fixture checker +unittest tests.test_boltz_trace_validateon Python 3.11); existingflake8job unchanged..gitignore: ignore large run artifacts; un-ignorescripts/boltz/fixtures/*.txtandsubmission/issue42-demonstration-evidence/screenshots/*.png(root*.pngis otherwise ignored).Documentation
docs/source/Boltz_Inference.md:expand_proxy_headssemantics.submission/issue42-demonstration-evidence/REPRODUCIBILITY.md(commands + embedded screenshots).README.md: links Boltz doc + submissionREPRODUCIBILITY.md; fixes “Openfold” to “Openfold implementation” spelling on the OpenFold line.How to test
CPU (matches CI)
ICE / Slurm (full pipeline)
Wait for
COMPLETED, then:Expected: Slurm stdout ends with
[PASS] validate_boltz_traces.py; strict re-run exits 0 with[OK] manifest.json ... paths match --run_dirand no[FAIL].