Skip to content

Add CorrectorOutput with CorrectorDiagnostics containing delta = corrected - network_output#1321

Draft
jpdunc23 wants to merge 3 commits into
mainfrom
feature/corrector-diagnostics
Draft

Add CorrectorOutput with CorrectorDiagnostics containing delta = corrected - network_output#1321
jpdunc23 wants to merge 3 commits into
mainfrom
feature/corrector-diagnostics

Conversation

@jpdunc23

@jpdunc23 jpdunc23 commented Jun 26, 2026

Copy link
Copy Markdown
Member

Introduces the value objects and helper that let a corrector report, per step, which variables it adjusted and by how much, without a downstream caller having to infer it. CorrectorDiagnostics holds a single delta mapping where delta[name] = corrected[name] − network_output[name] for each variable a corrector declares it touched; the raw pre-correction network value stays recoverable as output − delta. CorrectorOutput bundles the corrected output, its diagnostics, and the passthrough corrector state, giving the corrector a named return value that can grow new per-step payloads without breaking call sites. build_corrector_diagnostics computes the delta from an explicit declared touched-name set rather than by diffing tensors by object identity.

This is a pure addition: nothing returns or consumes these types yet, so corrected outputs, rollout state, losses, logged metrics, and saved files are unchanged, and existing checkpoints load unchanged. A follow-up change has the corrector return CorrectorOutput and removes the current object-identity capture; this PR lands the foundation in isolation so it can be reviewed on its own.

Changes:

  • fme.core.corrector.output.CorrectorDiagnostics — new value object with a single delta TensorMapping, defaulting to empty.

  • fme.core.corrector.output.CorrectorOutput — new value object bundling corrected, diagnostics, and a passthrough corrector_state.

  • fme.core.corrector.utils.build_corrector_diagnostics — new helper computing delta = corrected − input_snapshot over an explicit touched-name set (cumulative for a name touched more than once).

  • Tests added

  • If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

… corrected - network_output`

Introduces the value objects and helper that let a corrector report, per step, which variables it adjusted and by how much, without a downstream caller having to infer it. `CorrectorDiagnostics` holds a single `delta` mapping where `delta[name] = corrected[name] − network_output[name]` for each variable a corrector declares it touched; the raw pre-correction network value stays recoverable as `output − delta`. `CorrectorOutput` bundles the corrected output, its diagnostics, and the passthrough corrector state, giving the corrector a named return value that can grow new per-step payloads without breaking call sites. `build_corrector_diagnostics` computes the delta from an explicit declared touched-name set rather than by diffing tensors by object identity.

This is a pure addition: nothing returns or consumes these types yet, so corrected outputs, rollout state, losses, logged metrics, and saved files are unchanged, and existing checkpoints load unchanged. A follow-up change has the corrector return `CorrectorOutput` and removes the current object-identity capture; this PR lands the foundation in isolation so it can be reviewed on its own.

Changes:
- `fme.core.corrector.output.CorrectorDiagnostics` — new value object with a single `delta` `TensorMapping`, defaulting to empty.
- `fme.core.corrector.output.CorrectorOutput` — new value object bundling `corrected`, `diagnostics`, and a passthrough `corrector_state`.
- `fme.core.corrector.utils.build_corrector_diagnostics` — new helper computing `delta = corrected − input_snapshot` over an explicit touched-name set (cumulative for a name touched more than once).

- [ ] Tests added
- [ ] If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated
@jpdunc23 jpdunc23 requested a review from mcgibbon June 26, 2026 17:12
Comment thread pr-plan.md
label and metric fields without re-opening the corrector return type.
"""

delta: TensorMapping = dataclasses.field(default_factory=dict)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Our plan also included having something like a prescribed field that is deemed to be replacing a field, not applying a delta, and would e.g. be treated differently in corrector-based losses.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-on PR will handle labeling for default behavior. For various reasons the "delta" vs "prescribed" distinction is not well-defined (e.g., the hfds "prescribed" correction is both replacement and delta) but every correction can be described cleanly as delta = corrected - network_output. So the current plan I have in mind is to simply include labels which signal default intentions for downstream uses.

Let's please save the debate for the follow on PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed the prescribed-vs-delta concept needs to exist — the plan keeps it, just as a label on CorrectorDiagnostics in the follow-on PR rather than a second mapping here. Two reasons the label is the better home:

  1. It's additive — a new field on this value object, no change to the corrector return type or this helper.
  2. The bucket isn't always clean at the data layer: as you note, hfds is genuinely both a replacement and a delta, so a corrector can't authoritatively pre-sort fields. A label set where the loss/consumer knows its intent can.

And nothing is dropped in the meantime: for a fully-prescribed field, delta = prescribed − network_output captures the change exactly, with the prescribed value recoverable as output and the raw network value as output − delta. The loss in the next PR has everything it needs to treat prescribed fields differently.

— 🤖 Drafted by Claude (Opus 4.8) via Claude Code, posted on @jpdunc23's behalf.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current plan I have in mind is to simply include labels which signal default intentions for downstream uses.

Let's please save the debate for the follow on PR.

OK, we can discuss this in the context of the PR that implements these features. At least we're in agreement a class like this with delta defined should be added, and I think the other details are best discussed in that PR's context.

So the current plan I have in mind is to simply include labels which signal default intentions for downstream uses.

We had a long discussion where this was your plan going into the conversation, I posed my concerns with the plan, and we agreed on a different plan. That plan was basically to use multiple TensorMapping instead of labels, which would a single field to exist in multiple label buckets and be easier to work with than low-level one-hot-encodings of labels. Going back to the original plan I had concerns with will likely mean re-hashing that discussion, but it should be easier to have in the context of the next PR.

I think we need to talk more about HFDS specifically even separate from these PRs, because I'm kind of confused by some of it. Like the idea of corrector regularization and pre-corrector regularization, to my mind, are that we want to make it so that if the corrector were dropped, the network would give reasonable outputs, and that if we then use the corrector it's just being used for minor corrections. But that really doesn't seem to fit for HFDS.

I wonder if we might need a separate concept, outside of the corrector, for these "outputs that are derived from network outputs", and doing that could completely remove the need to have "prescribed" terms for the corrector (since they would live in the derived-output concept instead).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a long discussion where this was your plan going into the conversation, I posed my concerns with the plan, and we agreed on a different plan. That plan was basically to use multiple TensorMapping instead of labels, which would a single field to exist in multiple label buckets and be easier to work with than low-level one-hot-encodings of labels. Going back to the original plan I had concerns with will likely mean re-hashing that discussion, but it should be easier to have in the context of the next PR.

Actually trying to implement the plan we discussed last Thursday led to too much friction, thus the change in tack.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually trying to implement the plan we discussed last Thursday led to too much friction, thus the change in tack.

Hm, having just looked at some of the corrector code I think I can imagine what you mean. For example, some values get derived only after certain corrections are applied.

These scientific issues are something we're going to have to address, but maybe let's move forward with the PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea of corrector regularization and pre-corrector regularization, to my mind, are that we want to make it so that if the corrector were dropped, the network would give reasonable outputs, and that if we then use the corrector it's just being used for minor corrections

I agree with this

But that really doesn't seem to fit for HFDS

I don't agree. For the "prescribed" method of HFDS correction (which I now realize is a terrible name), we still have a hybrid of ML + surface forcing.

  • Where sea ice fraction is 0 (fully ocean) the output is the "derived" surface flux from the forcing; this always has small irreducible error in practice because our AM4 fluxes are output on a grid that is not fully consistent with the OM4 surface flux target HFDS.
  • Where sea ice fraction is 1 (fully ice) the output is fully ML.

But the network still predicts something for HFDS over the entire ocean including open ocean grid cells that are entirely replaced by the derived flux from the AM4 flux forcing. Optimizing the error of that output (or penalizing the magnitude of the delta) regardless encourages HFDS to stay close to the target (which, as we saw, improved the interannual variability of the 3D sea water temperature), but "prescribing" the open ocean flux forces Samudra to use the AM4 reference flux forcing when correcting the ocean heat content.

I wonder if we might need a separate concept, outside of the corrector, for these "outputs that are derived from network outputs"

Don't we already have this concept via derive_func?

Comment thread pr-plan.md
Comment on lines +24 to +25
delta[name] = corrected[name] − network_output[name] for each variable the
corrector declares it touched. Empty when no corrector ran or none modified

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: "replaced" values like the advected total column humidity in the case of columnwise moisture budget closure would not be included in "delta", at least not by default.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delta = corrected - network_output is a clean and well-defined definition for every network output, regardless of whether it is replaced. The "replaced" vs "delta" distinction adds unnecessary confusion to the output structure that is not necessary at this stage.

In a follow-on PR, intended default downstream usage can be added when it is needed; that's out of scope here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick clarification on the mechanic, since I think this case is actually covered: the touched-name set is the union of every variable the enabled options write, which includes replaced fields like the advected total column humidity under moisture-budget closure. So those fields are in delta (= prescribed_value − network_value), with the raw network value recoverable as output − delta — nothing replaced is excluded. Whether a consumer later chooses to treat a replaced field differently (e.g. drop it from a correction-magnitude loss) is the labeling policy from the other thread, in the follow-on PR.

— 🤖 Drafted by Claude (Opus 4.8) via Claude Code, posted on @jpdunc23's behalf.

Comment thread pr-plan.md
Comment on lines +62 to +80
def build_corrector_diagnostics( # NEW
input_snapshot: TensorMapping,
corrected: TensorMapping,
touched_names: Iterable[str],
) -> CorrectorDiagnostics:
"""Build the diagnostics for a corrector from an explicit touched-name set.

delta[name] = corrected[name] − input_snapshot[name] for name in touched_names.
This is a declared diff over a known name set — it replaces the role of the
object-identity ``captured_before`` heuristic (removed when the corrector
starts returning ``CorrectorOutput``). For a name modified by more than one
enabled correction, ``delta`` is the cumulative ``corrected − input_snapshot``,
so ``input_snapshot = corrected − delta`` stays exact with no per-operation
bookkeeping.
"""
delta = {
name: corrected[name] - input_snapshot[name] for name in touched_names
}
return CorrectorDiagnostics(delta=delta)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: this doesn't allow us to control what corrector impacts are deemed to be deltas and which ones aren't, as we discussed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All corrections are delta = corrected - network_output. Downstream handling for diagnostic and optim purposes is intentionally out of scope for this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue this PR preserves that control rather than removing it. Because the corrected value (StepOutput.output) is always present alongside delta, every per-field view stays recoverable downstream:

  • correction magnitude → delta
  • raw network value → output − delta
  • multiplicative scale → output / (output − delta)

So the correction loss/metric in the next PR can decide, per field, whether to treat an impact as a delta, a replacement, or neither — driven by a label or config, where the intent actually lives. The control isn't lost; it moves from a corrector-time storage split to a consumer-time policy, which is strictly more flexible since the corrector no longer has to pre-commit each field to a bucket. Storing one uniform delta is what keeps all of those options open.

— 🤖 Drafted by Claude (Opus 4.8) via Claude Code, posted on @jpdunc23's behalf.

@mcgibbon mcgibbon Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue isn't that delta is poorly defined, it's that the purpose is supposed to be to enable regularization strategies, but it doesn't fit that purpose.

For HDFS, with residual_prediction delta = ocean_fraction · net_flux. What's the purpose of regularizing this? When using prescribed, delta = ocean_fraction · (net_flux − gen_hfds) becomes a regularization term that encourages gen_hfds to become equal to net_flux. Is net_flux what the channel is being optimized on? If not, this is also an issue.

It's difficult to review this delta in a context where it's not used, can we move it to the next PR?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #1321 (comment) we're in agreement that delta stays in this PR.

@mcgibbon

Copy link
Copy Markdown
Contributor

I don't know if you were planning on handling these in downstream PRs, but at least build_corrector_diagnostics seems like something to avoid as it can cause issues with the overall design in later PRs if ever used.

@jpdunc23

Copy link
Copy Markdown
Member Author

I don't know if you were planning on handling these in downstream PRs, but at least build_corrector_diagnostics seems like something to avoid as it can cause issues with the overall design in later PRs if ever used.

build_corrector_diagnostics won't cause issues with design in later PRs. All corrected fields can be cleanly described as delta = corrected - network_output. Later PRs will tackle the handling of default optimization behavior, but that is out-of-scope for the time being.

@jpdunc23

Copy link
Copy Markdown
Member Author

Sharing the reasoning for keeping it, since the real question is whether it boxes in the later design — I don't think it does.

build_corrector_diagnostics is deliberately the most general primitive available: a declared diff, delta[name] = corrected[name] − network_output[name], over the set of names a corrector says it touched. It computes a value, not a policy. Every later feature on the roadmap — a prescribed/delta label, per-field loss treatment, an opt-out for replaced fields — is additive on CorrectorDiagnostics (a new field on the value object) and leaves this helper's contract untouched. So if anything it's the foundation those labels sit on, not something they'd fight.

What would box us in is the opposite: deciding at corrector time which fields are "real" deltas vs replacements and storing only that. Because StepOutput.output is always present, a single delta keeps every view recoverable downstream — raw network value = output − delta, scale = output / (output − delta) — so no later policy is foreclosed. More detail on the inline threads.

— 🤖 Drafted by Claude (Opus 4.8) via Claude Code, posted on @jpdunc23's behalf.

@mcgibbon

mcgibbon commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

We should move forward with a PR based on this plan, I sign off: #1321 (comment)

jpdunc23 added 2 commits June 26, 2026 12:41
…ctor

main introduced the CorrectionSequence builder pattern (corrections are now
composable Correction objects assembled by each config's _build). The plan's
value objects and build_corrector_diagnostics helper are structure-agnostic and
unchanged; update the utils.py snippet to name its new neighbors and add a
forward note that the touched-name set will be the union of names each enabled
Correction declares it writes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants