Skip to content

Inference output time coordinate can decouple from the loaded initial condition (multi-IC / ensemble runs) #1315

Description

@brianhenn

Summary

When running inference over multiple initial conditions (optionally × ensemble members), the time coordinate written to the output is not guaranteed to correspond to the initial condition that was actually loaded for that output. If the ordering used to assign output times differs from the order in which ICs are loaded, every output is silently mislabeled in time — no error is raised, and the predictions themselves are physically correct, so the bug is invisible unless you verify content against the label. This breaks any date-based verification against a reference dataset and any downstream analysis that reads the init time from the output's time[0].

This is a correctness bug in the inference output path (fme/ace/inference/ — the initial-condition data loading and the time coordinate propagated through loop.pydata_writer/), not a problem specific to one experiment. Any multi-IC and/or multi-member inference is potentially affected.

Expected vs actual

  • Expected: an output for initial condition i (member j) carries IC i's own timestamp, i.e. output.time[0] == IC_i.time + lead_step.
  • Actual: the output time is assigned from a source that can be ordered differently from the ICs as loaded, so output.time and the loaded IC content come from two different orderings of the same date list.

Likely mechanism

The output time appears to be derived from an index/base ordering rather than propagated from each loaded IC's timestamp. With N ICs and M ensemble members, the IC content is enumerated in one order while the time is enumerated in another (e.g. IC-major vs member-major), so they line up only by coincidence at the endpoints. The fix is to carry each loaded IC's timestamp through to the writer rather than reconstruct it from a counter.

Suggested fix

  • Ensure the prediction output time coordinate is propagated from the initial condition that was loaded (single source of truth: the IC dataset's own time), through loop.py and the data_writer, for every (IC, member) combination.
  • Add a regression test (e.g. alongside fme/ace/inference/data_writer/test_*.py or test_inference.py): run a small inference over ≥2 ICs × ≥2 members with distinct, known IC dates and assert each output's time[0] equals that IC's date + lead step. The current behavior would pass a single-IC test but fail this one — which is why it slipped through.

Concrete reproduction (the bug as observed)

Run: gs://vcm-ml-intermediate/2026-06-16-ace2s-land-feedback-inference/frameworkB-era5/segment_*/landfeedback_ic{NNNN}.zarr (branch exp/ace2s-land-feedback-inference), 96 outputs = 48 init-years (1977–2024) × 2 members.

For each output I correlated its day-1 deseasonalized anomaly against ERA5's same-calendar-day anomaly for every candidate year (spatial Pearson):

  • Every output matches some ERA5 year at r ≈ 0.99 (content is valid).
  • The correlation at the year written in time is ≈ 0 for 94/96 outputs (only the two endpoints coincide).

The label vs content relationship is a deterministic permutation of the output index m (0–95):

TRUE  year (content)  = 1977 + floor(m/2)     # IC-major: 1977,1977,1978,1978,…
STAMPED year (time)   = 1977 + (m mod 48)      # member-major: 1977…2024, then again

Likely also affects frameworkB-cm4-rs0 / frameworkB-cm4-rs1 (different IC count — not yet checked).

Impact / severity

  • Silent: no error; predictions are physically valid, so it is undetectable without content-vs-label verification.
  • Corrupts any lead-time skill verification (forecast aligned to the wrong truth year).
  • Existing affected outputs are recoverable by relabeling (the true time is deterministically derivable), so data need not be regenerated — but the code must be fixed so future inference runs are correct, and a test added so it cannot regress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions