Refactor inference data_writer configs to the Builder pattern#1314
Refactor inference data_writer configs to the Builder pattern#1314mcgibbon wants to merge 1 commit into
Conversation
Correct the Builder-pattern-pass violations in fme/ace/inference/data_writer/ so config schema is decoupled from the runtime writer impls. Behavior-preserving. Changes: - `fme.ace.inference.data_writer.main.DataWriterConfig`: `build`/`build_paired` now construct the full list of sub-writers (raw/monthly/file writers, plus the time-coarsen wrapper) and pass them to `DataWriter`/`PairedDataWriter` (Rule 2). The impls take a pre-built `writers` list instead of raw sub-configs and the enable flags. Add `DataWriterConfig.validate_time_coarsen` as the sanctioned accessor for parent configs. - `fme.ace.inference.data_writer.file_writer`: add `FileWriterParams`, a plain leaf params dataclass; `FileWriter` takes it instead of the non-leaf `FileWriterConfig` (Rule 1). Add `FileWriterConfig.validate_time_coarsen`. - `fme.ace.inference.inference.InferenceConfig` / `fme.ace.inference.evaluator.InferenceEvaluatorConfig`: `__post_init__` validate time coarsening via `data_writer.validate_time_coarsen(...)` instead of reading `data_writer.time_coarsen`/`.files` raw fields (Rule 1, parent read). - Tests: construct writers through `DataWriterConfig.build`/`build_paired` and `FileWriterConfig._build_writer_params` rather than the old impl signatures. - [ ] Tests added
ef7b149 to
ee0dd97
Compare
frodre
left a comment
There was a problem hiding this comment.
Thanks @mcgibbon. This is a nice consistency update that cleans up a fair amount of usage. Overall, I think the consolidation into builds is worthwhile. I'm not convinced of the need for a FileWriterParams over the configuration usage. I think there are simpler fixes to the issue of the runtime subselect_horizontal and I don't know if I agree with
dataclass attributes configured from yaml with dacite are considered private, and can only be used in methods on that dataclass
If a built component stores a configuration instead of locally re-storing all the information that could be useful, it should be able to look at that configuration. I do think the items that are publicly available from configurations should be shallow/imutable features of the config, maybe?
Other than that, I did find the agent generated PR description here used language or artifacts I'm not familiar with (leaf, builder audit, rulesets etc). If it's based on a document it would be helpful if you had the agent link to descriptions of the task with source of truth for definitions included.
|
|
||
|
|
||
| @dataclasses.dataclass | ||
| class FileWriterParams: |
There was a problem hiding this comment.
This should be a private class, right?
There was a problem hiding this comment.
I'm not convinced we need a new parameter handoff class to tighten usage declaration over just using the configuration in the build. While builds might have access to other fields they don't need, I don't necessarily think that's a large issue, given it's tightly coupled to code in this module. We could explore whether or not we need to restructure the top-level configuration and sub-configuration types.
| return FileWriterParams( | ||
| label=self.label, | ||
| names=self.names, | ||
| subselect_horizontal=bool(self.lat_extent or self.lon_extent), |
There was a problem hiding this comment.
We could have subselect_horizontal be a property and it would achieve the same thing, I think.
| ) -> "PairedDataWriter": | ||
| writers: list[PairedSubwriter] = [] | ||
| if self.save_prediction_files: | ||
| raw_writer: PairedSubwriter = PairedRawDataWriter( |
There was a problem hiding this comment.
There's a lot of duplication in these build calls, but that wasn't introduced by this PR. Just noticing.
| self, | ||
| writers: list[PairedSubwriter], | ||
| path: str, | ||
| initial_condition_times: npt.NDArray[cftime.datetime], |
| self.forward_steps_in_memory, | ||
| self.n_forward_steps, | ||
| ) | ||
| self.data_writer.validate_time_coarsen( |
There was a problem hiding this comment.
Another nice cleanup from this refactor.
Corrects the Builder-pattern-pass violations in
fme/ace/inference/data_writer/flagged by the 2026-06-24 builder-pattern audit, so the config schema (the dacite/YAML surface) is decoupled from the runtime writer implementations. The change is behavior-preserving. Follows the file-ordering / leaf-*Paramstemplate set by the corrector refactor (#1313).The audit flagged three issues here:
DataWriterConfig.build/build_pairedpassed rawtime_coarsen/filessub-configs intoDataWriter/PairedDataWriter, which then built the sub-writers themselves.FileWriterreceived the whole non-leafFileWriterConfigand read ~7 of its fields.InferenceConfig/InferenceEvaluatorConfig.__post_init__reached intodata_writer.time_coarsenand iterateddata_writer.files/file_config.time_coarsen.Changes:
fme.ace.inference.data_writer.main.DataWriterConfig:build/build_pairedconstruct the full list of sub-writers (raw/monthly/file writers, plus the time-coarsen wrapper) and pass them in;DataWriter/PairedDataWriter.__init__now take a pre-builtwriterslist instead of raw sub-configs + enable flags. Addedvalidate_time_coarsen()as the sanctioned accessor for parent configs.fme.ace.inference.data_writer.file_writer: addedFileWriterParams, a plain leaf params dataclass;FileWritertakes it (storesself._params, neverself._config) instead of the non-leafFileWriterConfig. AddedFileWriterConfig.validate_time_coarsen()and_build_writer_params().fme.ace.inference.inference.InferenceConfig/fme.ace.inference.evaluator.InferenceEvaluatorConfig:__post_init__now callsdata_writer.validate_time_coarsen(...)instead of reading the raw fields.DataWriterConfig.build/build_pairedandFileWriterConfig._build_writer_paramsrather than the old impl constructor signatures.Notes:
FileWriterParams.time_selectionholds the selector sub-configs (Slice | MonthSelector | TimeSlice), so the struct is a "runtime params" leaf in spirit (it forwards an opaque selector straight to_select_time, which dispatches via each selector's own methods — no raw cross-config field reads) rather than a strict field-type leaf. The alternative (re-deriving the selection inside the builder) would duplicate_select_time's dispatch.The non-paired
DataWriter's HEALPix (face) early-return is preserved inDataWriterConfig.build(it produces an empty writer list).path/metadata are now always stored, which only affects the never-used-in-production HEALPixDataWriter.writepath (previously wouldAttributeError; both inference entrypoints usebuild_paired, which never had afacecheck).The coupled
CoupledDataWriterConfigparent reads ofdata_writer.{ocean,atmosphere}.time_coarsenare a separate audit section (they wrap validation in per-component try/except messaging) and are left as a follow-up; this refactor keepstime_coarsena public field so nothing there breaks.Tests added (existing data_writer tests migrated to the public builder API; behavior coverage preserved)