Skip to content

muon_calibration: ONNX-backed shift+smear reweight pipeline#692

Open
bendavid wants to merge 24 commits into
WMass:mainfrom
bendavid:muon_reweight
Open

muon_calibration: ONNX-backed shift+smear reweight pipeline#692
bendavid wants to merge 24 commits into
WMass:mainfrom
bendavid:muon_reweight

Conversation

@bendavid
Copy link
Copy Markdown
Collaborator

Summary

End-to-end pipeline for the muon-calibration shift+smear reweight model — from snapshot creation through training, ONNX/AOTI export, C++ helper integration, and histmaker wiring. Routes the J/ψ-stats, Z-non-closure, and closure-uncertainty helpers through the trained network by default, with the analytic Splines / Gaussian / massWeights paths still available via --muonScaleVariation.

Built on top of #691 (CI container v53 + narf TBB task-arena fix), which provided the ONNX-runtime / threading prerequisites.

Pipeline pieces

  • Snapshot (J/ψ + W/Z → per-muon Arrow IPC shards). New: flow_training_snapshot.py, arrow_shard_export.py; structured-record intermediate dtype; train/val/holdout split via per-shard record-batch ranges (default 80/10/10).
  • Training (BCE shift+smear-reweight head). New: train_shift_smear_reweight.py, train_muon_response_flow.py, arrow_shard_loader.py. MLP-factored architecture (trunk + shift / smear heads composing an inner-product reweight), bf16 trainer, muon_source conditioning ({W/Z prompt, τ-decay, J/ψ leg} → {-1, 0, +1}).
  • Export (shift_smear_reweight_export.py): single-file ONNX with weights inlined; muon_source remap baked into the graph; AOTI/ONNX with dynamic batch + N_var.
  • C++ helpers (wremnants/production/include/shift_smear_reweight_helpers.hpp):
    • shift_smear_reweight::ReweightModel (narf onnx wrapper) + ReweightEvaluator<NVar> (shared per-muon kinematics → ONNX → exp(log_r) core).
    • JpsiCorrectionsUncReweightHelper<T> and SmearingUncertaintyReweightHelper<...> as drop-ins for the analytic Splines variants.
    • ZNonClosureParametrizedReweightHelper{Corl,} and ZNonClosureBinnedReweightHelper{Corl,} (new) replace the splines linearisation with the trained reweight; shared shift-source code with the existing Splines helpers via two free z_non_closure_*_delta_r_kappa helpers.
    • SmearingHelperSimpleReweight and ScaleHelperSimpleReweight for the w_z_muonresponse.py --testHelpers closure path.
  • Bundled ONNX (wremnants-data submodule bump): two variants matching the resolution-smearing setting,
    shift_smear_reweight_mlp_factored_combined_{smearing,nosmearing}.onnx. The factories pick the matching one based on --no-smearing.

Histmaker integration

  • New CLI choice --muonScaleVariation onnxReweight (default). The Splines / Gaussian / massWeights paths are unchanged.
  • make_muon_smearing_helpers, make_jpsi_crctn_unc_helper, make_closure_uncertainty_helper, make_uniform_closure_uncertainty_helper, make_Z_non_closure_parametrized_helper, make_Z_non_closure_binned_helper all accept smearing=True and resolve the bundled ONNX path automatically; histmakers (mw_with_mu_eta_pt, mw_with_mu_eta_pt_VETOEFFI, mz_dilepton, mz_wlike_with_mu_eta_pt, w_z_muonresponse) pass smearing=not args.noSmearing.
  • New muon_calibration.jpsi_style_cols(df, helper, reco_sel_GF, response_weight_col) centralises the per-helper column-list selection (9-col with muon_source for ONNX, 7-col with response_weight for analytic Splines).
  • w_z_muonresponse.py --testHelpers adds ONNX reweight references (hist_qopr_smeared_weight_onnx, hist_qopr_scaled_weight_onnx) alongside the existing Splines / MC-sample references.
  • New plotter scripts/corrections/muon_calibration/plot_muonresponse_reweight_closure.py overlays the variants with a variant/MC-truth ratio panel (auto-rescaling the multi-replica reference, auto-zooming the ratio y-range).

Drive-by fix

  • SmearingHelperSimple{,Multi} were passing the raw signed sigmarel * qop to std::normal_distribution, which trips a libstdc++ stddev>0 assertion for q<0 muons. std::abs + zero-stddev guard.

Test plan

Verified on Wminustaunu_2016PostVFP and a Z prompt dataset (--maxFiles 1 -j1):

  • mz_dilepton.py (default --muonScaleVariation onnxReweight) runs end-to-end and produces output.
  • mz_dilepton.py --muonScaleVariation smearingWeightsSplines still works (no regression on the analytic path).
  • mz_dilepton.py --noSmearing runs end-to-end (auto-selects the _nosmearing ONNX).
  • w_z_muonresponse.py --testHelpers produces the closure histograms; plotter reproduces the expected per-permille ONNX scale closure and ~1% smear closure consistent with the per-model diagnostics.
  • CI on the v53 container.

🤖 Generated with Claude Code

bendavid and others added 7 commits May 22, 2026 15:35
…+train+export)

Adds the minimum file set needed to run the shift_smear_reweight
pipeline end-to-end on top of main:

  Snapshot (J/psi + W/Z -> per-muon Arrow IPC shards):
    scripts/corrections/muon_calibration/flow_training_snapshot.py
    wremnants/production/arrow_shard_export.py

  Training (BCE shift+smear reweight head):
    scripts/corrections/muon_calibration/train_shift_smear_reweight.py
    scripts/corrections/muon_calibration/train_muon_response_flow.py
    scripts/corrections/muon_calibration/arrow_shard_loader.py

  Export + diagnostics:
    scripts/corrections/muon_calibration/shift_smear_reweight_export.py
    scripts/corrections/muon_calibration/shift_smear_reweight_diagnostics.py

  Plumbing:
    wremnants/production/datasets/dataset_tools.py  (WREMNANTS_DATA_PATH override + FQDN host match)
    scripts/tests/testenv.py                        (env verification)
    CLAUDE.md                                       (container setup notes)
    narf / rabbit / wums                            (submodule pointers)

wremnants/production/muon_calibration.py is left identical to main; the
misctechnical changes to it (the dead make_parameterized_scale_shift_helper
and the flexible_define / globalidxv variants) ride along with a
module_corrections.hpp header that isn't part of this minimal set, so
sticking with main's helper code keeps the snapshot self-consistent.

Deliberately excluded (not needed for this pipeline): the polyhead /
score / flow-onnx variants, the J/psi calibration-tensor workflow
(aggregategrads, bake_couplings, make_jpsi_calibration_tensor,
fitresults_to_correctionResults), the C++ AOTI/ORT inference
benchmarks, the new muon_calibration.hpp / module_corrections.hpp
additions (J/psi tensor only), and the older mlp-only reweight export.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… ONNX path

Production follow-up to 046226b9 (minimal pipeline). The trained ONNX
model is now bundled in wremnants-data and used by default in the
muon-calibration uncertainty helpers.

Sharded snapshots and dataset splits
- arrow_shard_export: structured-record intermediate dtype (mixed
  int/float) for the bucket-shuffle writer.
- arrow_shard_loader: train/val/holdout partition via
  split_batch_range() over per-shard record batches (default
  80/10/10); each split reads only its own slice, avoiding
  epoch-edge full-shard scans.
- flow_training_snapshot: drop per-muon ``event`` column -- split is
  decided at consumer time, not snapshot time.
- train_shift_smear_reweight / train_muon_response_flow: CLIs and
  stats computation use the new split scheme.
- shift_smear_reweight_diagnostics: --split (default holdout), plus
  per-charge and per-source closure breakdowns.

Model export
- shift_smear_reweight_export: single-file ONNX with weights inlined
  by default (--inline-weights), muon_source {1,15,443} -> {-1,0,+1}
  remap baked into the graph, and AOTI export fixes (dynamic_shapes
  positional tuple, N-D linear decomposition for the factored heads).

C++ helpers and integration
- shift_smear_reweight_helpers.hpp (new):
  JpsiCorrectionsUncReweightHelper and SmearingUncertaintyReweightHelper.
  NCond bumped 5 -> 6; muon_source_from_gen_part_flav() maps the
  Muon_genPartFlav input to the network's expected source code.
- muon_calibration: bundled default ONNX path
  (wremnants-data/data/calibration/shift_smear_reweight_mlp_factored_combined.onnx);
  --muonScaleVariation default switched to ``onnxReweight``; a
  ``{reco_sel_GF}_muon_source`` RVec<int> column is injected when an
  ONNX reweight helper is active.

Submodule bumps:
- narf: TBB-task-arena thread index for the ONNX session pool
  (prevents slot collisions across RDataFrame loops in the same arena).
- wremnants-data: bundles the trained ONNX artifact.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pton)

Three coupled fixes that the previous commit's defaults exposed:

shift_smear_reweight_helpers.hpp -- cling JIT crash
- narf::onnx_helper_alloc holds Ort::Env / std::vector<Ort::Session>
  (non-copyable). RDataFrame's Define / narf::DefineWrapper need to
  copy the callable into internal storage, so the helpers must be
  copyable too. Wrap onnx_ inside ReweightModel, and model_ inside
  each helper, in std::shared_ptr.
- ReweightModel::run was wrapping its tensor refs in std::cref before
  forwarding to narf::onnx_helper_alloc::operator(). narf::tensor_traits
  has no reference_wrapper<> specialisation -- ::get_sizes() then fails
  during cling's lazy instantiation, surfacing later as a recursive
  descent / illegal instruction during graph build. Switch to std::tie.
- Ort::Value::CreateTensor<T>(... T* p_data ...) is non-const even for
  input tensors, so run()'s input tensor refs must be non-const (the
  caller's stack-allocated scratch buffers).

muon_calibration.py -- aux closure helper routing
- The "...Splines..." vs analytic variants of the parametrised /
  binned Z non-closure helpers differ in whether they consume the
  per-muon response_weight column. dilepton with --muonScaleVariation
  onnxReweight also ships that column (network ignores its value, but
  it is present in the DataFrame), so route onnxReweight through the
  Splines variant too in make_Z_non_closure_parametrized_helper and
  make_Z_non_closure_binned_helper.

mz_dilepton.py -- ONNX-helper Define integration
- Build the SplinesDifferentialWeightsHelper (diff_weights_helper)
  whenever --muonScaleVariation is "smearingWeightsSplines" or
  "onnxReweight" (was only the former). This keeps input_kinematics
  consistent (always with response_weight) for the auxiliary closure
  helpers downstream.
- For the J/psi stats-uncertainty Define, if the helper is the ONNX
  reweight type, use a local 10-element column list that includes
  recoPhi/genPhi/muon_source/response_weight, leaving the 7-element
  input_kinematics intact for the analytic-style closure helpers
  below.

Verified end-to-end on Wminustaunu_2016PostVFP --maxFiles 1 -j1
for both --muonScaleVariation onnxReweight (new default) and
smearingWeightsSplines (no regression).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… evaluator

The J/psi-stats and smearing-uncertainty ONNX helpers used to inline the
per-muon kinematics packing, u_buf fill, model run and exp/clip step.
The Z non-closure scale-uncertainty helpers were still splines-only,
which forced the histmakers to keep building the SplinesDifferentialWeightsHelper
and the response_weight column even when --muonScaleVariation=onnxReweight.

shift_smear_reweight_helpers.hpp
- New shift_smear_reweight::ShiftReweightEvaluator<NVar>: per-muon raw
  kinematics + caller-supplied delta_r_kappa[NVar] -> alt_weights[NVar].
  Encapsulates the y_raw/c_raw build, the u_buf fill, the ONNX call and
  the exp/clip step. Owns the ReweightModel via shared_ptr so the host
  class stays copyable.
- JpsiCorrectionsUncReweightHelper refactored to compose ShiftReweightEvaluator;
  ~120 LOC of per-muon code collapses to ~25. Also drops response_weights
  from the column list (the network gives the full reweight directly,
  so the splines linearisation isn't consulted).
- SmearingUncertaintyReweightHelper drops response_weights for consistency
  (it still has its own σ-build logic since resolution variations are
  smear-only).
- Four new helpers: ZNonClosureParametrizedReweightHelper{Corl,}<T, N>
  and ZNonClosureBinnedReweightHelper{Corl,}<T, N, M>. NVar = 2 down/up
  per muon. The per-muon shift-source calculation (the bit that mirrors
  the existing Splines helpers' recoQopUnc loop) lives in two free
  functions z_non_closure_{param,binned}_delta_r_kappa, so the four
  classes share both the shift source AND the ShiftReweightEvaluator.

muon_calibration.py
- make_Z_non_closure_parametrized_helper / make_Z_non_closure_binned_helper:
  default scale_var_method to "onnxReweight"; add onnx_path / onnx_nslots
  kwargs. Route to the four new C++ classes for the ONNX path.
- _is_onnx_reweight_helper matches any of the six reweight-helper prefixes.
- jpsi_style_cols: ONNX branch returns a 9-col list (no response_weight) --
  the column is genuinely unused now.
- add_resolution_uncertainty: column list for ONNX no longer includes
  response_weight.
- add_jpsi_crctn_stats_unc_hists / add_jpsi_crctn_Z_non_closure_hists:
  use jpsi_style_cols where possible; the W histmaker path now handles
  the ONNX helpers correctly.

mz_dilepton.py
- diff_weights_helper is no longer built for --muonScaleVariation=onnxReweight
  (splines response_weight is completely bypassed in this mode).

Verified end-to-end on Wminustaunu_2016PostVFP and a Z prompt dataset
with both --muonScaleVariation=onnxReweight (default) and
smearingWeightsSplines, --maxFiles 1 -j1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
``SmearingHelperSimple::operator()`` was passing ``dsigma = sigmarel_ * qop``
straight to ``std::normal_distribution{qop, dsigma}``. For q<0 muons
``qop < 0`` so ``dsigma < 0``, which trips libstdc++'s assertion
``_M_stddev > _RealType(0)`` and aborts (seen in
``scripts/histmakers/w_z_muonresponse.py``). Wrap in ``std::abs`` to
match what ``SmearingHelperSimpleMulti`` already did.

Also add a ``dsigma > 0`` guard in both helpers so a legitimate
``sigmarel == 0`` (no smearing) falls through cleanly -- the assertion
is strict greater-than and ``std::abs(0) == 0`` would still trip it.
For the Multi variant the δ-function case emits N copies of ``qop``.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shift_smear_reweight_helpers.hpp
- Rename shift_smear_reweight::ShiftReweightEvaluator -> ReweightEvaluator.
  The namespace already carries the shift+smear context; the bare class
  name should just say what the type does, especially now that it handles
  both u and σ inputs. Touches the typedefs in J/psi-stats, Z non-closure
  (parametrized / binned, Corl & uncorrelated), and the new SimpleReweight
  helpers below.
- Extend ReweightEvaluator::evaluate with a (delta_r_kappa, sigma_r_kappa)
  overload; the shift-only signature now forwards to it with a zero σ
  array (existing callers unchanged).
- Refactor SmearingUncertaintyReweightHelper to compose ReweightEvaluator<NVar>
  instead of inlining the y/c/u/σ packing + ONNX call (~85 LOC -> ~40).
- Add SmearingHelperSimpleReweight and ScaleHelperSimpleReweight as ONNX
  drop-ins for wrem::SmearingHelperSimpleWeight / ScaleHelperSimpleWeight:
  same scalar reweight semantics, computed via the trained network
  instead of the analytic dweightd[mu|sigmasq] linearisation.

scripts/histmakers/w_z_muonresponse.py
- In --testHelpers, construct SmearingHelperSimpleReweight and
  ScaleHelperSimpleReweight using muon_calibration.default_shift_smear_reweight_onnx,
  Define selMuons_muon_source (per-muon Muon_genPartFlav passthrough,
  remapped to {-1, 0, +1} inside the ORT graph), compute weight_smear_onnx /
  weight_scale_onnx, and book the matching hist_qopr_*_weight_onnx
  histograms on the same axes as the splines counterparts.

scripts/corrections/muon_calibration/plot_muonresponse_reweight_closure.py
- New closure plotter. Loads the histmaker output, aggregates over MC
  processes, projects to qopr, and overlays nominal / MC-truth / splines
  reweight / ONNX reweight / splines transform with a variant-over-truth
  ratio panel. Emits both log-y and linear-y versions (4 files per
  closure type × 2 = 8 files / run).
- The MC-smeared reference (hist_qopr_smearedmulti) is filled with
  nreps=100 replicas per muon, so its integral is N_reps × the nominal
  integral. The plotter detects this empirically (integral ratio) and
  rescales so all curves share a common normalisation; for single-sample
  reference hists this is a no-op.
- Ratio-panel y-range auto-zooms to max |ratio - 1| in the visible
  x-range, restricted to bins with >=1% of the truth peak (filters
  noise tails); padded by 1.3×, clamped to [±0.5%, ±50%]. Manual
  override: --ratio-ylim LO HI.

Closure level verified consistent with the per-source diagnostic from
direct20/splitmlp_bf16_bce_exp_largetrunk_factshift: ~1% wavy residual
at sigmarel=5e-3 (= 0.3 σ_y), ~few permille on splines and ~0 on ONNX
at scalerel=5e-4 (= 0.03 σ_y).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The shift+smear-reweight network is conditioned on the reco pt's
treatment of the data/MC resolution-matching smearing helper, so a
model trained on a smeared snapshot must not be applied to un-smeared
reco kinematics (and vice versa). wremnants-data now ships two
bundled variants:

  shift_smear_reweight_mlp_factored_combined_smearing.onnx     (default)
  shift_smear_reweight_mlp_factored_combined_nosmearing.onnx

Replace the module-level ``default_shift_smear_reweight_onnx`` string
with a function ``default_shift_smear_reweight_onnx(smearing=True)``
that returns the matching path. Every factory that takes an ONNX path
(``make_muon_smearing_helpers``, ``make_jpsi_crctn_unc_helper``,
``make_closure_uncertainty_helper``, ``make_uniform_closure_uncertainty_helper``,
``make_Z_non_closure_parametrized_helper``, ``make_Z_non_closure_binned_helper``)
gains a ``smearing=True`` kwarg and falls back to the matching default
path when ``onnx_path`` is None.

Wrapper factories ``make_jpsi_crctn_helpers`` and
``make_Z_non_closure_helpers`` thread ``smearing`` through to the
per-helper builders (the latter reads ``args.noSmearing`` directly).

Histmaker call sites updated to pass ``smearing=not args.noSmearing``:
mw_with_mu_eta_pt, mw_with_mu_eta_pt_VETOEFFI, mz_dilepton,
mz_wlike_with_mu_eta_pt. ``w_z_muonresponse.py --testHelpers`` picks
the matching variant for its SmearingHelperSimpleReweight /
ScaleHelperSimpleReweight constructions too.

Also bump the wremnants-data submodule pointer to the
``add no-smearing ONNX; rename existing to _smearing suffix`` commit
that introduced the two-file bundle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bendavid and others added 4 commits May 22, 2026 16:55
CI linting on PR #692 failed on isort / flake8 / black. Apply the
same commands the workflow runs:

* ``isort . --skip narf --skip rabbit --skip wremnants-data --skip wums
  --profile black --line-length 88``
* ``black --exclude '(^\.git|\.github|\.ipynb|narf|rabbit|wremnants-data|wums)' .``
* drop F401 unused imports (``typing.Tuple``, ``typing.List``, ``hist``,
  ``import zuko`` next to ``from zuko.flows import ...`` blocks,
  ``PreprocStats`` in ``train_shift_smear_reweight.py``).
* mark ``scripts/tests/testenv.py`` imports with ``# noqa: F401``
  (the file exists exactly to verify those imports work).
* fix a real F821 in ``train_muon_response_flow.py``:
  ``_detach_pure_coefs_in_joint`` takes a ``polyhead`` argument but
  read ``head.is_pure_u | head.is_pure_sigma`` instead of
  ``polyhead.*``. Would crash at the first JOINT-mode batch when
  ``--detach-pure-{shift,smear}-in-joint`` is on.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mpilable

CI's ``check C++ Files`` step runs ``clang++ -fsyntax-only`` on every
header alone. shift_smear_reweight_helpers.hpp depended on symbols
that muon_calibration.hpp brings in (``wrem::clip_tensor``,
``wrem::calculateQopUnc``, ``wrem::SmearingHelperParametrized``, the
``using ROOT::VecOps::RVec`` pulled into namespace wrem,
``narf::get_value``), and the runtime cling load order made it work,
but the standalone check failed.

Add a header guard to muon_calibration.hpp and #include it from
shift_smear_reweight_helpers.hpp. With the guard, re-includes in cling
become no-ops, so the runtime ``narf.clingutils.Declare`` of both
headers still works.

Verified standalone:
  clang++ -I./narf/narf/include/ -I./wremnants/include/ \\
          -I .../ROOT/include -std=c++20 -fsyntax-only \\
          wremnants/production/include/shift_smear_reweight_helpers.hpp

passes (exit 0) for both headers. mz_dilepton and w_z_muonresponse
--testHelpers still produce output.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The W (``mw_with_mu_eta_pt.py``) and W-like (``mz_wlike_with_mu_eta_pt.py``)
histmakers crashed under the new ``--muonScaleVariation=onnxReweight``
default with

  ``Column "nominal_muonScaleSyst_responseWeights_tensor" is not in a dataset
  and is not a custom column been defined``

(``add_jpsi_crctn_stats_unc_hists``)

and

  ``10 column names are required but 7 were provided`` for
  ``JpsiCorrectionsUncReweightHelper`` (``muonScaleClosSyst_responseWeights_tensor_splines``).

Two pieces:

* ``muon_calibration.add_jpsi_crctn_stats_unc_hists`` gains an
  ``onnxReweight`` branch parallel to ``smearingWeightsSplines``: build
  the ONNX column list via ``jpsi_style_cols``, Define
  ``muonScaleSyst_responseWeights_tensor_onnx``, and alias
  ``nominal_muonScaleSyst_responseWeights_tensor`` to it.

* ``mw_with_mu_eta_pt`` and ``mz_wlike_with_mu_eta_pt`` route every
  inline Define that consumes ``closure_unc_helper{,_A,_M}``,
  ``z_non_closure_parametrized_helper``, and (in wlike)
  ``data_jpsi_crctn_unc_helper`` through ``jpsi_style_cols`` so each
  helper sees the matching column list -- 10-col with ``muon_source``
  for ONNX, 7-col with ``response_weight`` for analytic Splines.

Verified end-to-end:
  python scripts/histmakers/mw_with_mu_eta_pt.py    --filterProcs Wminusmunu_2016PostVFP --maxFiles 1
  python scripts/histmakers/mz_wlike_with_mu_eta_pt.py --filterProcs Zmumu_2016PostVFP        --maxFiles 1
both produce output. isort / flake8 / black all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
setupRabbit smoothing of the ``muonResolutionSyst_responseWeights``
systematic crashed with ``numpy.linalg.LinAlgError: Singular matrix``
in ``solve_leastsquare`` (and ``solve_nonnegative_leastsquare``).

The fake-rate ABCD smoothing fits a small-window polynomial per bin.
When too many bins in a window are 0 / negative (the ~10% rate we
see across most systematics), the X^T X matrix for that window is
rank-deficient. ``np.linalg.inv`` raises; ``np.linalg.pinv``
(Moore-Penrose pseudo-inverse) returns the minimum-norm solution
and is numerically equivalent for full-rank matrices.

The pinv also serves as the parameter covariance returned by both
functions: in the rank-deficient direction it returns 0, which is
the right downstream behaviour -- no spurious large uncertainty in
a direction the data couldn't constrain.

Triggered downstream of WMass#692 (ONNX-backed
``SmearingUncertaintyReweightHelper`` produces ~74 eigenvariations
with wider per-event tails than the analytic helper; the smoothing
hits a degenerate window for one of them and crashes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bendavid
Copy link
Copy Markdown
Collaborator Author

Modest differences in the CI for the impacts, which could be expected. Will run the full stats CI for a more robust comparison.

@bendavid
Copy link
Copy Markdown
Collaborator Author

There are some general problems that prevents the long-CI from finishing (also for the reference). I'll have a look at this.

In the meantime the comparison is available for mZ (dilepton mass) and W-like. The muon calibration uncertainty for Wlike decreases from 5.39 to 4.61 MeV which is probably larger than justified (this would imply that the splines had a pretty big spurious contribution to the uncertainty). More importantly for the dilepton mass, the calibration impact decreases from 4.62 to 1.38 MeV which clearly makes no sense, so I will follow up what is wrong there.

bendavid and others added 13 commits May 23, 2026 16:20
Empirically the deployed shift+smear reweight network's effective
first-order gradient is calibrated for u in qop space, not r_κ space.
Per-eigenvariation half-diff comparison on Wmunu/Zmumu with --validationHists
showed the ONNX variations ~100× LARGER than the analytic-Splines path
when u was supplied as δr_κ = δqop · p_gen · sign(q_gen). Dropping the
``p_gen · sign(q_gen)`` factor (i.e. passing δqop in GeV⁻¹) brings ONNX
and Splines magnitudes into ~1× agreement across all three processes.

ReweightEvaluator::evaluate docstring updated to reflect that
u_raw / sigma_raw are in physical units (GeV⁻¹, radians, radians)
matching y_raw component-by-component, NOT r_κ.

Callers updated to drop the p_gen factor:
* JpsiCorrectionsUncReweightHelper: ``dr = recoQopUnc`` (was
  ``recoQopUnc · pgen · sign_qgen``).
* SmearingUncertaintyReweightHelper: σ_raw = sqrt(dsigmarelsq · qop_reco²)
  (was sqrt(... · pgen²)).
* z_non_closure_param_delta_u_raw / z_non_closure_binned_delta_u_raw
  (renamed from ``*_delta_r_kappa``): drop p_gen · sign_qgen, no longer
  take gen kinematics.
* SmearingHelperSimpleReweight: σ_raw = sigmarel / p_reco (was
  sigmarel · pgen / preco).
* ScaleHelperSimpleReweight: u_raw = scalerel · q_reco / p_reco
  (was scalerel · q_reco · sign_qgen · pgen / preco).

Verified per-eigenvariation half-diff RMS over 144 J/ψ-stats variations,
mw_with_mu_eta_pt --validationHists --maxFiles 5:

  Process      | RMS splines | RMS ONNX | ratio (was)
  Wplusmunu    | 2.21        | 2.00     | 0.91  (120×)
  Wminusmunu   | 2.52        | 1.50     | 0.59  ( 54×)
  Zmumu        | 0.88        | 0.94     | 1.06  (180×)

mz_dilepton, w_z_muonresponse --testHelpers still produce output.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… reweight

ReweightModel::run and ReweightEvaluator::evaluate declared their y / c / u
/ σ / log_r buffers with Eigen's default ColMajor layout, but
onnx_helper_alloc forwards .data() to ORT, which interprets the buffer as
row-major. For shape (1, NVar, F) with NVar ≥ 2 the two layouts disagree
on how the (NVar, F) plane is packed in memory, so the per-variation u
and σ inputs were silently scrambled — variation 0 saw a non-physical
mixed-axis shift and variation 1 came back as exactly exp(0) = 1.0.

NVar = 1 escaped because length-1 axes are layout-invariant, which is why
ScaleHelperSimpleReweight and SmearingHelperSimpleReweight (used by
w_z_muonresponse --testHelpers) kept matching the splines reference; the
broken path was every NVar ≥ 2 helper — JpsiCorrectionsUncReweightHelper
(NVar = 2·nUnc), all four Z-non-closure variants (NVar = 2),
SmearingUncertaintyReweightHelper, and the closureA / closureM helpers
(NVar = 2). Declaring the buffers Eigen::RowMajor fixes the byte order
with no call-site changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… call sites

The signature no longer takes ``tflite_file`` as the second positional arg
or ``dummy_mu_scale_var`` / ``dummy_var_mag`` as kwargs; three call sites
in ``add_jpsi_crctn_stats_unc_hists`` still passed them, which would have
errored at runtime (and was silently feeding ``tflite_file`` in as
``scale_A``).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ference

For shifts the small-magnitude rows of the closure plot are dominated by
h_shifted's per-bin Poisson noise — too few events cross any bin edge for
the literal reference to be informative. Add an ``h_lin`` curve computed
directly from h_raw via centered finite differences,

    h_lin(y_j; δy) = h_raw[j] − δy · (∂h/∂y)[j],

which matches h(y − δy) at linear order with no extra sampling noise.
For smears the analog is the σ² Taylor term,

    h_lin(y_j; σ) = h_raw[j] + ½ σ² · (∂²h/∂y²)[j].

Below ``--lin-noise-thresh`` average per-bin event crossings (default
100, S/N ≈ √100 ≈ 10) the literal reference is considered
noise-dominated and the lin reference takes over as the ratio-panel
denominator and y-range setter; otherwise the literal stays primary.
Both curves are always drawn — lin in red, the literal in black, the
pred ratio (blue for shift / green for smear) colored by numerator.
Pad floor on the ratio panel y-range is also removed so the auto-zoom
follows the actual pred/ref ratio spread.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
``make_muon_smearing_helpers`` now takes ``scale_var_method`` and only
builds the ONNX-backed ``SmearingUncertaintyReweightHelper`` when the
caller selected ``onnxReweight``; for ``smearingWeightsSplines`` /
``smearingWeightsGaus`` / ``massWeights`` it falls back to the analytic
``SmearingUncertaintyHelperParametrized`` that consumes the splines
``response_weight`` column. Previously the smearing helper always
picked the ONNX path regardless of the flag, so a side-by-side
ONNX-vs-splines comparison on the resolution syst impact was
impossible from CLI alone.

The five histmaker call sites (mz_dilepton, mw_with_mu_eta_pt,
mw_with_mu_eta_pt_VETOEFFI, mz_wlike_with_mu_eta_pt, w_z_muonresponse)
now forward ``scale_var_method=args.muonScaleVariation``. The two
``flow_training_snapshot.py`` callers keep the default (always-ONNX)
since they run from the training pipeline, not the analysis-side
histmakers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The network parameterises σ as a positive magnitude (and is even in σ
since Gaussian smearing is symmetric under σ→−σ), so the previous
helper clamped any eigenvariation with ``dsigmarelsq<0`` (a reduce-
resolution direction) to σ=0 and returned exp(0)=1 — a no-op. The
analytic ``SmearingUncertaintyHelperParametrized`` instead applies the
signed linearisation 1+∂_σ²·δσ², so the two diverged exactly on the
"shrink the resolution" half of every Hessian eigendirection.

Recover the signed response without retraining by feeding |δσ²| to ONNX
and applying the leading-order odd symmetry of log_r in δσ² afterwards:
log_r(−|δσ²|) ≈ −log_r(+|δσ²|), i.e. alt_weight → 1/alt_weight when
dsigmarelsq<0. Exact to first order in δσ²; higher-order curvature
terms get the wrong sign in this approximation, but those are
O((δσ²)²) and negligible for the 1σ eigenvariations the helper is
ever called with.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pair with bendavid/narf#44. The narf submodule is bumped to the
``onnx-helper-iobinding`` tip so ``narf::onnx_helper`` exposes the
new persistent-buffer + IoBinding + TensorMap-based fill interface
and ``narf::onnx_helper_alloc`` is gone.

ReweightModel becomes a class template on ``NVar`` and constructs
``narf::onnx_helper`` once with the explicit input/output shapes
[{1,F}, {1,NCond}, {1,NVar,F}, {1,NVar,F}] / [{1,NVar}], pinning
the model's dynamic axes at instantiation. ``run`` is no longer a
function template — it takes the fixed-shape RowMajor Eigen
tensors directly, inputs as ``const &``. ``ReweightEvaluator<NVar>``
holds a ``std::shared_ptr<ReweightModel<NVar>>`` and calls
``model_->run(...)`` without a template argument.

Per-call we no longer pay for Ort::Value construction or name array
setup — the IoBinding is one-shot at construction; ``operator()``
copies in/out through ``Eigen::TensorMap<…, RowMajor>`` views over
the persistent ORT-owned buffers. The RowMajor declarations on the
caller-side Eigen tensors still match the model shape for the copy
to be a simple memcpy, but they no longer need to match the
network's storage order for correctness — Eigen's cross-layout
assignment handles that.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pton externalPostfit step

The high-stats workflow (``setup 1:1 data:mc events`` branch of
``setenv``) was timing out at NTHREADS=64; bump to 128 to keep
under the per-step time budget.

The ``dilepton ptll from wlike`` step invokes ``rabbit_fit.py`` with
``--externalPostfit`` (loading the wlike fit's ``uncorr`` postfit
values) and no local re-minimisation. ``rabbit_fit.py`` then
unconditionally tries to compute an EDM + covariance from the local
Hessian at that externally-supplied point, which is generically
indefinite off-minimum — Cholesky failed at the 4th leading minor
and the step exited non-zero, even though the fit-results file had
already been written. Add ``--noHessian`` (already wired in
``rabbit_fit.py`` at line 298) to skip that local Hessian / EDM
computation; the downstream plotting step reads the saved
``fitresults_from_ZMassWLike_eta_pt_charge.hdf5`` and the
externally-loaded postfit covariance, neither of which depended on
the indefinite local Hessian.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…MassDilepton externalPostfit step"

This reverts commit 24ddf17.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bendavid
Copy link
Copy Markdown
Collaborator Author

There were some technical fixes (mismatched memory order for Eigen tensors vs onnx etc) which was leading to nonsense in evaluating the model. This is fixed now.

The standard CI now matches extremely well against the reference.

The high stats CI reference is currently broken for the mW fit, but the wlike and dilepton mass agree extremely well.

For the high stats dilepton mass comparison, which is the most sensitive the numbers now look like

reference (https://github.com/WMass/WRemnants/actions/runs/26323635314/job/77497221066):

   resolutionCrctn: 0.00684
   binByBinStat: 0.00741
   theory: 0.00766
   stat: 0.01034
   scaleClosCrctn: 0.0127
   scaleClosACrctn: 0.02085
   scaleCrctn: 0.03891
   muonCalibration: 0.04622
   expNoLumi: 0.04626
   experiment: 0.04626
   massShift: 0.04886
   Total: 0.04886

this PR (https://github.com/WMass/WRemnants/actions/runs/26368527418/job/77616529258)


   resolutionCrctn: 0.00609
   theory: 0.00722
   binByBinStat: 0.00741
   stat: 0.01034
   scaleClosCrctn: 0.01263
   scaleClosACrctn: 0.02071
   scaleCrctn: 0.0386
   muonCalibration: 0.04594
   expNoLumi: 0.04598
   experiment: 0.04598
   massShift: 0.04859
   Total: 0.04859

So there are small differences as can be expected, but group by group the impacts match very well.

I've reverted the changes to the high stats CI since this can/should be discussed separately in #693

In principle once the standard CI runs again (runners are currently down) this should be good to merge (pending review of course)

@bendavid
Copy link
Copy Markdown
Collaborator Author

For some more details, here are variation-by-variation comparisons with respect to the splines for scale and resolution

image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant