Skip to content

fix: parsing relative paths in workflow yml#1365

Draft
LuukBlom wants to merge 13 commits into
mainfrom
fix/parsing-relative-paths-in-workflow-yml
Draft

fix: parsing relative paths in workflow yml#1365
LuukBlom wants to merge 13 commits into
mainfrom
fix/parsing-relative-paths-in-workflow-yml

Conversation

@LuukBlom

@LuukBlom LuukBlom commented Feb 5, 2026

Copy link
Copy Markdown
Collaborator

Issue addressed

Fixes #1350

Explanation

Added the validator resolve_paths_or_catalogs in HydromtGlobalConfig, which is used to validate the workflow yaml.

Refactored read_workflow_yaml to now actually return HydromtModelSetup instead of 3 dicts.

Moved all of the validation of the steps from the Model class into the step classes in model_config.py
Now has:

  • RawStep: this class validates the structure of the yaml, no runtime function binding yet.
  • HydromtModelStep: class that validates the hydromt_step part, has the function as an attribute.

Cyclic import errors

While fixing this, I encountered some problematic cyclic import errors, and after some thought I have re-introduced hydromt.io again.

The problem is that some of the readers depend on other parts of the hydromt codebase, when they should only read the files and return content. This means importing any reader (even just read_toml), will import a lot of other hydromt modules.

My idea for the separation:

  • _io: just file io, should not depend / know about any other part of hydromt
  • parsers: take in raw data and return validated hydromt objects/classes
  • readers: combine io and parsers to read and validate files, also exposes _io functions like read_toml for backwards compatibility

This separation means we can import the low-level _io of read_yaml or read_toml pretty much everywhere in hydromt since they have no hydromt dependencies.

General Checklist

  • Updated tests or added new tests
  • Branch is up to date with main
  • Tests & pre-commit hooks pass
  • Updated documentation
  • Updated changelog.rst

Data/Catalog checklist

  • data/catalogs/predefined_catalogs.yml has not been modified.
  • None of the old data_catalog.yml files have been changed
  • data/changelog.rst has been updated
  • new file uses LF line endings (done automatically if you used update_versions.py)
  • New file has been tested locally
  • Tests have been added using the new file in the test suite

Additional Notes (optional)

Add any additional notes or information that may be helpful.

@dalmijn

dalmijn commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

I haven't reviewed it, but did notice the io.py submodule returning which we deliberately removed in #1308.
Please keep io in the readers and writers.

@LuukBlom

LuukBlom commented Feb 9, 2026

Copy link
Copy Markdown
Collaborator Author

I haven't reviewed it, but did notice the io.py submodule returning which we deliberately removed in #1308. Please keep io in the readers and writers.

Please read the PR description for my reasoning why I added it again.

@sonarqubecloud

sonarqubecloud Bot commented Feb 9, 2026

Copy link
Copy Markdown

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

@dalmijn

dalmijn commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

I realise that, but it can cause issues as specified in PR #1308 and it's subsequent issue.

@LuukBlom

LuukBlom commented Feb 9, 2026

Copy link
Copy Markdown
Collaborator Author

I realise that, but it can cause issues as specified in PR #1308 and it's subsequent issue.

Ah okay I wasnt aware of that. Will rename to something else

@LuukBlom LuukBlom requested a review from deltamarnix February 9, 2026 12:42
Comment thread hydromt/_validators/data_catalog_v0x.py Outdated
@sonarqubecloud

sonarqubecloud Bot commented Feb 9, 2026

Copy link
Copy Markdown

@deltamarnix deltamarnix requested a review from Copilot February 9, 2026 15:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors workflow YAML parsing/validation to properly resolve relative paths and to return a validated HydromtModelSetup object, while also moving step validation/execution logic into step model classes and introducing a low-level IO module to reduce import cycles.

Changes:

  • Added hydromt._io and migrated YAML/TOML + URI YAML reading into it to reduce cyclic imports.
  • Refactored read_workflow_yaml to return a validated HydromtModelSetup via a new parse_workflow parser.
  • Moved step parsing/validation/execution into RawStep/HydromtModelStep and updated CLI/tests accordingly.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
hydromt/_io.py Introduces low-level YAML/TOML/URI helpers used across the package to avoid heavy imports.
hydromt/readers.py Refactors read_workflow_yaml to return HydromtModelSetup and delegates parsing to parse_workflow.
hydromt/parsers.py New parser module to merge defaults + validate workflow dicts into HydroMT objects.
hydromt/_validators/model_config.py Adds RawStep, step binding/execution, global config path/catalog resolution, and new setup schema.
hydromt/model/model.py Updates build/update to execute bound step objects; improves component instantiation from validated configs.
hydromt/model/steps.py Adds a global step registry populated by @hydromt_step.
hydromt/cli/main.py Updates CLI build/update/validate flow to use HydromtModelSetup output.
hydromt/data_catalog/data_catalog.py Switches YAML loading helper to hydromt._io.yml_from_uri_or_path.
hydromt/_validators/data_catalog_v0x.py Switches YAML loading helper to hydromt._io.yml_from_uri_or_path.
hydromt/_validators/data_catalog_v1x.py Switches YAML loading helper to hydromt._io.yml_from_uri_or_path.
hydromt/model/components/config.py Uses low-level IO functions from hydromt._io.
hydromt/model/example/example_model.py Adjusts ExampleModel init/component wiring to updated patterns/types.
tests/test_readers.py Adds workflow YAML parsing/path-resolution test coverage for new behavior.
tests/cli/test_cli.py Updates assertions for new step validation error message format.
tests/model/test_model.py Updates expectations for new step-not-found error type/message.
tests/data_catalog/test_data_catalog.py Updates to new YAML loader function location/name.
tests/components/test_config_component.py Adjusts imports to new read_yaml location.
tests/_validators/test_model_config.py Updates to new step model constructor signature + error types/messages.
tests/_validators/test_data_catalog_validation_v0x.py Updates to new YAML loader function location/name.
tests/_validators/test_data_catalog_validation_v1x.py Updates to new YAML loader function location/name.
examples/working_with_models.ipynb Updates workflow reading example to new return type.
examples/working_with_models_basics.ipynb Notebook metadata change (kernelspec display name).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hydromt/readers.py
Comment thread hydromt/readers.py Outdated
Comment thread hydromt/_io.py Outdated
Comment thread hydromt/_io.py
Comment thread hydromt/model/steps.py Outdated
Comment thread hydromt/model/steps.py Outdated
Comment thread tests/test_readers.py

@deltamarnix deltamarnix left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a look at the names of the config classes.
We also talked about what Model should know about. Should it know about the config classes, or should the config classes know about the model? Perhaps don't convert during the validation, but only validate and keep the data raw.

Comment thread hydromt/model/example/example_model.py Outdated
Comment thread hydromt/model/model.py Outdated
Comment thread hydromt/readers.py
Comment thread hydromt/parsers.py Outdated
rename _deep_merge to deep_merge
update _steps_contain_write
update example model with properties
@LuukBlom LuukBlom mentioned this pull request Feb 10, 2026
11 tasks

@hboisgon hboisgon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some food for thoughts and a couple of warnings on the current implementation.

As highlighted in the ther PR, I think we need a design session about in general how do we build a model with hydromt before we can really finalise this PR. This is major change for python users for now if we do this we should do it well and avoid changing (again) in the future.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what we choose but there are 4 pages in the docs that need updating (search for read_workflow_yaml) + update changelog
Also should API be updated? Users will need to know what object they are getting when calling read_workflow_yaml so this should be documented.

Comment thread hydromt/cli/main.py
from pydantic import ValidationError

from hydromt import __version__
from hydromt._io import read_yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better keep using readers to import read_yaml then if I understand well?

Comment thread hydromt/cli/main.py
mode=mode,
data_libs=data_libs,
**kwargs,
**workflow.globals_.model_dump(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I am fond of model_dump here. Can we simplify for beginner python users? This is basic hydromt use building a model, I dont want to make it too complex at this stage.

from pathlib import Path
from typing import TYPE_CHECKING, Any, Dict, Optional, Tuple, Union, cast

from hydromt._io import read_toml, read_yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess you can undo and go back to readers. Most likely plugins will copy and modify this component so we should limit calls to private functions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why any change in this file? I dont think we want model components to be initialised this way. _config is private for a reason. Same for the property definition, not sure we want to envourage plugins to define their components this way.

Comment thread hydromt/readers.py
Comment on lines +946 to +947
Access the model type with `.modeltype`, the global configuration with `.globals_`,
and the runtime steps with `.steps`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what modeltype, globals and steps are: str, ?, dict? needs documentation of the HydroMTModelSetup class in public API

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess .global is not possible? Then why not .global_ to try and keep close to the workflow yaml? global goes to model init so could also be .model_init?

Comment thread hydromt/parsers.py
# Determine model type
resolved_modeltype = modeltype or data.get("modeltype")
if resolved_modeltype is None:
raise ValueError("Model type not specified in workflow or arguments")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("Model type not specified in workflow or arguments")
raise ValueError("Model type not specified in workflow YAML or arguments")

Comment thread hydromt/parsers.py
if (
abs_path
and root is not None
and (skip_abspath_sections is None or "global" not in skip_abspath_sections)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble following but what happens if "global" in skip_abspath_sections? Does this means that relative paths in steps will not be converted? Ie using a local data file instead of catalog entry for the setup functions?

Comment thread hydromt/parsers.py
raise ValueError("Model type not specified in workflow or arguments")

# Context for path resolution
context = {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect anything else in context than just root? If not why make it a dictionnary?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in the other PR. See comments there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parsing relative data in the global section of the workflow yaml

5 participants