Skip to content

Refactor inference data_writer configs to the Builder pattern#1314

Draft
mcgibbon wants to merge 1 commit into
mainfrom
refactor/data-writer-builder-pattern
Draft

Refactor inference data_writer configs to the Builder pattern#1314
mcgibbon wants to merge 1 commit into
mainfrom
refactor/data-writer-builder-pattern

Conversation

@mcgibbon

@mcgibbon mcgibbon commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Corrects the Builder-pattern-pass violations in fme/ace/inference/data_writer/ flagged by the 2026-06-24 builder-pattern audit, so the config schema (the dacite/YAML surface) is decoupled from the runtime writer implementations. The change is behavior-preserving. Follows the file-ordering / leaf-*Params template set by the corrector refactor (#1313).

The audit flagged three issues here:

  • Rule 2DataWriterConfig.build/build_paired passed raw time_coarsen/files sub-configs into DataWriter/PairedDataWriter, which then built the sub-writers themselves.
  • Rule 1 (non-leaf impl read)FileWriter received the whole non-leaf FileWriterConfig and read ~7 of its fields.
  • Rule 1 (parent read)InferenceConfig/InferenceEvaluatorConfig.__post_init__ reached into data_writer.time_coarsen and iterated data_writer.files / file_config.time_coarsen.

Changes:

  • fme.ace.inference.data_writer.main.DataWriterConfig: build/build_paired construct the full list of sub-writers (raw/monthly/file writers, plus the time-coarsen wrapper) and pass them in; DataWriter/PairedDataWriter.__init__ now take a pre-built writers list instead of raw sub-configs + enable flags. Added validate_time_coarsen() as the sanctioned accessor for parent configs.
  • fme.ace.inference.data_writer.file_writer: added FileWriterParams, a plain leaf params dataclass; FileWriter takes it (stores self._params, never self._config) instead of the non-leaf FileWriterConfig. Added FileWriterConfig.validate_time_coarsen() and _build_writer_params().
  • fme.ace.inference.inference.InferenceConfig / fme.ace.inference.evaluator.InferenceEvaluatorConfig: __post_init__ now calls data_writer.validate_time_coarsen(...) instead of reading the raw fields.
  • Tests: build writers through DataWriterConfig.build/build_paired and FileWriterConfig._build_writer_params rather than the old impl constructor signatures.

Notes:

  • FileWriterParams.time_selection holds the selector sub-configs (Slice | MonthSelector | TimeSlice), so the struct is a "runtime params" leaf in spirit (it forwards an opaque selector straight to _select_time, which dispatches via each selector's own methods — no raw cross-config field reads) rather than a strict field-type leaf. The alternative (re-deriving the selection inside the builder) would duplicate _select_time's dispatch.

  • The non-paired DataWriter's HEALPix (face) early-return is preserved in DataWriterConfig.build (it produces an empty writer list). path/metadata are now always stored, which only affects the never-used-in-production HEALPix DataWriter.write path (previously would AttributeError; both inference entrypoints use build_paired, which never had a face check).

  • The coupled CoupledDataWriterConfig parent reads of data_writer.{ocean,atmosphere}.time_coarsen are a separate audit section (they wrap validation in per-component try/except messaging) and are left as a follow-up; this refactor keeps time_coarsen a public field so nothing there breaks.

  • Tests added (existing data_writer tests migrated to the public builder API; behavior coverage preserved)

Correct the Builder-pattern-pass violations in
fme/ace/inference/data_writer/ so config schema is decoupled from the
runtime writer impls. Behavior-preserving.

Changes:
- `fme.ace.inference.data_writer.main.DataWriterConfig`: `build`/`build_paired`
  now construct the full list of sub-writers (raw/monthly/file writers, plus
  the time-coarsen wrapper) and pass them to `DataWriter`/`PairedDataWriter`
  (Rule 2). The impls take a pre-built `writers` list instead of raw
  sub-configs and the enable flags. Add `DataWriterConfig.validate_time_coarsen`
  as the sanctioned accessor for parent configs.
- `fme.ace.inference.data_writer.file_writer`: add `FileWriterParams`, a plain
  leaf params dataclass; `FileWriter` takes it instead of the non-leaf
  `FileWriterConfig` (Rule 1). Add `FileWriterConfig.validate_time_coarsen`.
- `fme.ace.inference.inference.InferenceConfig` /
  `fme.ace.inference.evaluator.InferenceEvaluatorConfig`: `__post_init__`
  validate time coarsening via `data_writer.validate_time_coarsen(...)` instead
  of reading `data_writer.time_coarsen`/`.files` raw fields (Rule 1, parent read).
- Tests: construct writers through `DataWriterConfig.build`/`build_paired` and
  `FileWriterConfig._build_writer_params` rather than the old impl signatures.

- [ ] Tests added
@mcgibbon mcgibbon force-pushed the refactor/data-writer-builder-pattern branch from ef7b149 to ee0dd97 Compare June 24, 2026 22:18

@frodre frodre left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mcgibbon. This is a nice consistency update that cleans up a fair amount of usage. Overall, I think the consolidation into builds is worthwhile. I'm not convinced of the need for a FileWriterParams over the configuration usage. I think there are simpler fixes to the issue of the runtime subselect_horizontal and I don't know if I agree with

dataclass attributes configured from yaml with dacite are considered private, and can only be used in methods on that dataclass

If a built component stores a configuration instead of locally re-storing all the information that could be useful, it should be able to look at that configuration. I do think the items that are publicly available from configurations should be shallow/imutable features of the config, maybe?

Other than that, I did find the agent generated PR description here used language or artifacts I'm not familiar with (leaf, builder audit, rulesets etc). If it's based on a document it would be helpful if you had the agent link to descriptions of the task with source of truth for definitions included.



@dataclasses.dataclass
class FileWriterParams:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a private class, right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need a new parameter handoff class to tighten usage declaration over just using the configuration in the build. While builds might have access to other fields they don't need, I don't necessarily think that's a large issue, given it's tightly coupled to code in this module. We could explore whether or not we need to restructure the top-level configuration and sub-configuration types.

return FileWriterParams(
label=self.label,
names=self.names,
subselect_horizontal=bool(self.lat_extent or self.lon_extent),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have subselect_horizontal be a property and it would achieve the same thing, I think.

) -> "PairedDataWriter":
writers: list[PairedSubwriter] = []
if self.save_prediction_files:
raw_writer: PairedSubwriter = PairedRawDataWriter(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of duplication in these build calls, but that wasn't introduced by this PR. Just noticing.

self,
writers: list[PairedSubwriter],
path: str,
initial_condition_times: npt.NDArray[cftime.datetime],

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a nice cleanup!

self.forward_steps_in_memory,
self.n_forward_steps,
)
self.data_writer.validate_time_coarsen(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice cleanup from this refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants