Skip to content

Changes to how we do distance_to calculations#2379

Merged
maxwhitemet merged 6 commits into
metoppv:masterfrom
maxwhitemet:distance_to_modifications
May 28, 2026
Merged

Changes to how we do distance_to calculations#2379
maxwhitemet merged 6 commits into
metoppv:masterfrom
maxwhitemet:distance_to_modifications

Conversation

@maxwhitemet
Copy link
Copy Markdown
Contributor

This PR refactors a lot of the original functionality used to create 'distance to' ancillaries (distance to ocean, rivers, lakes).

Previously,
improver/generate_ancillaries/generate_distance_to_feature.py defined 1 class (DistanceTo), which was called by two improver/generate_ancillaries/generate_miscellaneous_ancillaries.py functions to make specific ancillaries: generate_distance_to_ocean() and generate_distance_to_water(). This implementation obscured useful functionality.

The generate_distance_to_ocean() function, reasonably, suggests it can only be used to generate a distance_to_ocean ancillary, which is not true if you look at the code more closely: it could for instance be used to generate ancillaries for any distance-to ancillary such as distance-to-rivers or distance-to-lakes ancillaries if you supplied a relevant ancillary (i.e. if you supplied a rivers or lakes geodataframe instead of a coast geodataframe for the coastline argument).

I have thus (1) renamed the DistanceTo class to DistanceToFeature and merged useful functionality from the previously abstracting functions within generate_miscellaneous_ancillaries.py into this class, and (2) created an additional class called DistanceToClosestFeature, refactoring the code previously used for the generate_distance_to_water() functionality, reflecting how this code could be used much more generally, (3) moved relevant unit tests around and (4) made the documentation more intuitive and demonstrative of possible use cases.

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

@maxwhitemet maxwhitemet force-pushed the distance_to_modifications branch from 1d332cf to b41136e Compare May 13, 2026 15:54
Copy link
Copy Markdown
Contributor

@mo-jbeaver mo-jbeaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion added, but overall happy with the changes made.

site_cube: Cube,
geometry: GeoDataFrame,
exclude_outside_of: Optional[GeoDataFrame] = None,
exclusion_buffer: float = 10,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Only used if exclude_outside_of is provided, should this not be optional too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Implemented as suggested.

Copy link
Copy Markdown
Member

@MoseleyS MoseleyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'd try to review something to help out. Not sure my brain is working quite as well as I hoped. Anyway, here are two thoughts to accept or ignore.

Areas projection suitable for the UK.
new_name:
The name of the output cube.
output_name:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that output_name is more descriptive, there are lots of examples of new_name for this purpose in IMPROVER. I favour consistency here, but feel free to argue back.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change, as suggested, and made the required associated changes across files that use this class.

site_coords, geometry_projection = self.project_geometry(geometry, site_cube)

if self.clip_geometry_flag:
if self._should_clip_geometry:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Should" sounds optional. I agree that renaming is good, but something imperative might be better - _do_geometry_clipping?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested.

Comment thread improver/generate_ancillaries/generate_distance_to_feature.py
Comment on lines +217 to +221
exclude_outside_of: GeoDataFrame,
exclusion_buffer: float,
) -> List[int]:
"""Apply exclusion geometry logic: set distances to 0 for sites outside the
exclusion geometry.
Copy link
Copy Markdown
Contributor

@mo-AliceLake mo-AliceLake May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may just be me, but I find it a bit counterintuitive that sites outside the exclusion geometry are set to 0.

I think exclude_outside_of is a great, clear name - so maybe this is just something that just needs to be a bit clearer in the docstring description? Perhaps something like "set distances to 0 for sites outside of the provided geometry"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested.

distance_results:
The calculated distances to the feature.
exclude_outside_of:
A GeoDataFrame containing the exclusion geometry. Sites outside this
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, find "exclusion geometry" a bit counterintuitive. Could we go with something like "geometry defining the valid region" instead of "exclusion geometry"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested, and made some small changes throughout the docs to reflect this update.

Comment thread improver/generate_ancillaries/generate_distance_to_feature.py
A cube containing the distance to closest feature ancillary data.
"""
import numpy as np

Copy link
Copy Markdown
Contributor

@mo-AliceLake mo-AliceLake May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment for this script:

I know you ask for this in the docstring, but for safety, can we include a check here that each cube does contain the same sites (in the same order, etc)? Similarly, as we rely on the first cube in the list to provide us with metadata, can we check that all the metadata matches?

I think we might also need to test that distance_to_features isn't empty.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now tested:

  • metadata compatibility: cubes contain the same sites (incl. in the same order) and other dimension metadata.
  • distance_to_features isn't empty


# Calculate the minimum distance across all features
distances_to_features = np.stack([cube.data for cube in distance_to_features])
min_distance = np.min(distances_to_features, axis=0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use distances_to_features.min(axis=0) here since this is a NumPy array, but both are equivalent, so just a style preference. 🙂

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested.

min_distance = np.min(distances_to_features, axis=0)

# Create a new cube for the distance to closest feature
distance_to_closest = distance_to_features[0].copy(data=min_distance)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense 🙂 might be worth making it explicit (e.g. via a small comment or explicitly creating a template_cube variable) that the first cube is being used as the metadata template here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small extension to the comment

Comment on lines +467 to +468
if output_name:
distance_to_closest.rename(output_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the behaviour we really want here? I worry it could be confusing to end up with a cube called (for example) distance_to_sea, when it actually represents distance to sea and rivers and lakes, etc., and that traceability has been lost.

Would it be better to default to None/"undefined", or alternatively make output_name a required argument?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. I have made the argument required across both plugins.

@@ -0,0 +1,66 @@
# (C) Crown Copyright, Met Office. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start to the unit tests for this script 🙂

A few follow-ups based on earlier comments- it might be worth adding some additional tests to make the intended behaviour a bit more robust:

  • If we default to None / "undefined" when no output_name is provided, it would be good to include a test for that.
  • Since we rely on the first cube as a metadata template, it would be good to include a test that an error is raised if the metadata is not consistent across cubes.
  • It would also be helpful to explicitly check behaviour for an empty CubeList (e.g. raising a clear error).
  • Finally, since we assume all input cubes contain the same sites, a test ensuring mismatched shapes or ordering raises an error would help avoid silent issues

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented tests that cover the above suggestions.

Copy link
Copy Markdown
Contributor

@mo-AliceLake mo-AliceLake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes look good, and the code is clear. I found it much easier to follow than before these changes, particularly the explanation of why DistanceToClosestFeature might be needed/used. 🙂

I've left a few smaller comments around docstrings, happy for those to be taken or left as they are.

My main thought is that it's probably worth making some of the behaviour in the DistanceToNearestFeature class a bit more robust by adding explicit checks, as there are a few edge case ways it could silently fail and/or give misleading output.

Copy link
Copy Markdown
Contributor Author

@maxwhitemet maxwhitemet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your reviews @mo-AliceLake, @mo-jbeaver, @MoseleyS. I have responded to your feedback and implemented all suggestions.

Areas projection suitable for the UK.
new_name:
The name of the output cube.
output_name:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change, as suggested, and made the required associated changes across files that use this class.

Comment on lines +217 to +221
exclude_outside_of: GeoDataFrame,
exclusion_buffer: float,
) -> List[int]:
"""Apply exclusion geometry logic: set distances to 0 for sites outside the
exclusion geometry.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested.

distance_results:
The calculated distances to the feature.
exclude_outside_of:
A GeoDataFrame containing the exclusion geometry. Sites outside this
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested, and made some small changes throughout the docs to reflect this update.

site_cube: Cube,
geometry: GeoDataFrame,
exclude_outside_of: Optional[GeoDataFrame] = None,
exclusion_buffer: float = 10,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Implemented as suggested.

site_coords, geometry_projection = self.project_geometry(geometry, site_cube)

if self.clip_geometry_flag:
if self._should_clip_geometry:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested.

A cube containing the distance to closest feature ancillary data.
"""
import numpy as np

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now tested:

  • metadata compatibility: cubes contain the same sites (incl. in the same order) and other dimension metadata.
  • distance_to_features isn't empty


# Calculate the minimum distance across all features
distances_to_features = np.stack([cube.data for cube in distance_to_features])
min_distance = np.min(distances_to_features, axis=0)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested.

min_distance = np.min(distances_to_features, axis=0)

# Create a new cube for the distance to closest feature
distance_to_closest = distance_to_features[0].copy(data=min_distance)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small extension to the comment

Comment on lines +467 to +468
if output_name:
distance_to_closest.rename(output_name)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. I have made the argument required across both plugins.

@@ -0,0 +1,66 @@
# (C) Crown Copyright, Met Office. All rights reserved.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented tests that cover the above suggestions.

@maxwhitemet maxwhitemet requested a review from mo-AliceLake May 27, 2026 17:00
Copy link
Copy Markdown
Contributor

@mo-AliceLake mo-AliceLake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, can see all comments have been addressed - thanks Max. 🙂

@maxwhitemet maxwhitemet requested a review from mo-jbeaver May 28, 2026 09:47
Copy link
Copy Markdown
Contributor

@mo-jbeaver mo-jbeaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the updates made and the tests passed successfully.

@maxwhitemet maxwhitemet merged commit 01e222c into metoppv:master May 28, 2026
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants