fix: parsing relative paths in workflow yml#1365
Conversation
|
I haven't reviewed it, but did notice the io.py submodule returning which we deliberately removed in #1308. |
Please read the PR description for my reasoning why I added it again. |
|
❌ The last analysis has failed. |
|
I realise that, but it can cause issues as specified in PR #1308 and it's subsequent issue. |
Ah okay I wasnt aware of that. Will rename to something else |
|
There was a problem hiding this comment.
Pull request overview
This PR refactors workflow YAML parsing/validation to properly resolve relative paths and to return a validated HydromtModelSetup object, while also moving step validation/execution logic into step model classes and introducing a low-level IO module to reduce import cycles.
Changes:
- Added
hydromt._ioand migrated YAML/TOML + URI YAML reading into it to reduce cyclic imports. - Refactored
read_workflow_yamlto return a validatedHydromtModelSetupvia a newparse_workflowparser. - Moved step parsing/validation/execution into
RawStep/HydromtModelStepand updated CLI/tests accordingly.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| hydromt/_io.py | Introduces low-level YAML/TOML/URI helpers used across the package to avoid heavy imports. |
| hydromt/readers.py | Refactors read_workflow_yaml to return HydromtModelSetup and delegates parsing to parse_workflow. |
| hydromt/parsers.py | New parser module to merge defaults + validate workflow dicts into HydroMT objects. |
| hydromt/_validators/model_config.py | Adds RawStep, step binding/execution, global config path/catalog resolution, and new setup schema. |
| hydromt/model/model.py | Updates build/update to execute bound step objects; improves component instantiation from validated configs. |
| hydromt/model/steps.py | Adds a global step registry populated by @hydromt_step. |
| hydromt/cli/main.py | Updates CLI build/update/validate flow to use HydromtModelSetup output. |
| hydromt/data_catalog/data_catalog.py | Switches YAML loading helper to hydromt._io.yml_from_uri_or_path. |
| hydromt/_validators/data_catalog_v0x.py | Switches YAML loading helper to hydromt._io.yml_from_uri_or_path. |
| hydromt/_validators/data_catalog_v1x.py | Switches YAML loading helper to hydromt._io.yml_from_uri_or_path. |
| hydromt/model/components/config.py | Uses low-level IO functions from hydromt._io. |
| hydromt/model/example/example_model.py | Adjusts ExampleModel init/component wiring to updated patterns/types. |
| tests/test_readers.py | Adds workflow YAML parsing/path-resolution test coverage for new behavior. |
| tests/cli/test_cli.py | Updates assertions for new step validation error message format. |
| tests/model/test_model.py | Updates expectations for new step-not-found error type/message. |
| tests/data_catalog/test_data_catalog.py | Updates to new YAML loader function location/name. |
| tests/components/test_config_component.py | Adjusts imports to new read_yaml location. |
| tests/_validators/test_model_config.py | Updates to new step model constructor signature + error types/messages. |
| tests/_validators/test_data_catalog_validation_v0x.py | Updates to new YAML loader function location/name. |
| tests/_validators/test_data_catalog_validation_v1x.py | Updates to new YAML loader function location/name. |
| examples/working_with_models.ipynb | Updates workflow reading example to new return type. |
| examples/working_with_models_basics.ipynb | Notebook metadata change (kernelspec display name). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
deltamarnix
left a comment
There was a problem hiding this comment.
Let's have a look at the names of the config classes.
We also talked about what Model should know about. Should it know about the config classes, or should the config classes know about the model? Perhaps don't convert during the validation, but only validate and keep the data raw.
rename _deep_merge to deep_merge update _steps_contain_write update example model with properties
hboisgon
left a comment
There was a problem hiding this comment.
Some food for thoughts and a couple of warnings on the current implementation.
As highlighted in the ther PR, I think we need a design session about in general how do we build a model with hydromt before we can really finalise this PR. This is major change for python users for now if we do this we should do it well and avoid changing (again) in the future.
There was a problem hiding this comment.
Depending on what we choose but there are 4 pages in the docs that need updating (search for read_workflow_yaml) + update changelog
Also should API be updated? Users will need to know what object they are getting when calling read_workflow_yaml so this should be documented.
| from pydantic import ValidationError | ||
|
|
||
| from hydromt import __version__ | ||
| from hydromt._io import read_yaml |
There was a problem hiding this comment.
Better keep using readers to import read_yaml then if I understand well?
| mode=mode, | ||
| data_libs=data_libs, | ||
| **kwargs, | ||
| **workflow.globals_.model_dump(), |
There was a problem hiding this comment.
Not sure I am fond of model_dump here. Can we simplify for beginner python users? This is basic hydromt use building a model, I dont want to make it too complex at this stage.
| from pathlib import Path | ||
| from typing import TYPE_CHECKING, Any, Dict, Optional, Tuple, Union, cast | ||
|
|
||
| from hydromt._io import read_toml, read_yaml |
There was a problem hiding this comment.
Guess you can undo and go back to readers. Most likely plugins will copy and modify this component so we should limit calls to private functions
There was a problem hiding this comment.
Why any change in this file? I dont think we want model components to be initialised this way. _config is private for a reason. Same for the property definition, not sure we want to envourage plugins to define their components this way.
| Access the model type with `.modeltype`, the global configuration with `.globals_`, | ||
| and the runtime steps with `.steps`. |
There was a problem hiding this comment.
Not clear what modeltype, globals and steps are: str, ?, dict? needs documentation of the HydroMTModelSetup class in public API
There was a problem hiding this comment.
I guess .global is not possible? Then why not .global_ to try and keep close to the workflow yaml? global goes to model init so could also be .model_init?
| # Determine model type | ||
| resolved_modeltype = modeltype or data.get("modeltype") | ||
| if resolved_modeltype is None: | ||
| raise ValueError("Model type not specified in workflow or arguments") |
There was a problem hiding this comment.
| raise ValueError("Model type not specified in workflow or arguments") | |
| raise ValueError("Model type not specified in workflow YAML or arguments") |
| if ( | ||
| abs_path | ||
| and root is not None | ||
| and (skip_abspath_sections is None or "global" not in skip_abspath_sections) |
There was a problem hiding this comment.
I'm having trouble following but what happens if "global" in skip_abspath_sections? Does this means that relative paths in steps will not be converted? Ie using a local data file instead of catalog entry for the setup functions?
| raise ValueError("Model type not specified in workflow or arguments") | ||
|
|
||
| # Context for path resolution | ||
| context = {} |
There was a problem hiding this comment.
Do you expect anything else in context than just root? If not why make it a dictionnary?
There was a problem hiding this comment.
Reviewed in the other PR. See comments there.



Issue addressed
Fixes #1350
Explanation
Added the validator
resolve_paths_or_catalogsinHydromtGlobalConfig, which is used to validate the workflow yaml.Refactored
read_workflow_yamlto now actually returnHydromtModelSetupinstead of 3 dicts.Moved all of the validation of the steps from the
Modelclass into the step classes inmodel_config.pyNow has:
RawStep: this class validates the structure of the yaml, no runtime function binding yet.HydromtModelStep: class that validates thehydromt_steppart, has the function as an attribute.Cyclic import errors
While fixing this, I encountered some problematic cyclic import errors, and after some thought I have re-introduced
hydromt.ioagain.The problem is that some of the readers depend on other parts of the hydromt codebase, when they should only read the files and return content. This means importing any reader (even just
read_toml), will import a lot of other hydromt modules.My idea for the separation:
_io: just file io, should not depend / know about any other part of hydromtparsers: take in raw data and return validated hydromt objects/classesreaders: combineioandparsersto read and validate files, also exposes _io functions likeread_tomlfor backwards compatibilityThis separation means we can import the low-level _io of
read_yamlorread_tomlpretty much everywhere in hydromt since they have no hydromt dependencies.General Checklist
mainData/Catalog checklist
data/catalogs/predefined_catalogs.ymlhas not been modified.data_catalog.ymlfiles have been changeddata/changelog.rsthas been updatedLFline endings (done automatically if you usedupdate_versions.py)Additional Notes (optional)
Add any additional notes or information that may be helpful.