Changes to how we do distance_to calculations by maxwhitemet · Pull Request #2379 · metoppv/improver

maxwhitemet · 2026-05-13T15:33:04Z

This PR refactors a lot of the original functionality used to create 'distance to' ancillaries (distance to ocean, rivers, lakes).

Previously,
improver/generate_ancillaries/generate_distance_to_feature.py defined 1 class (DistanceTo), which was called by two improver/generate_ancillaries/generate_miscellaneous_ancillaries.py functions to make specific ancillaries: generate_distance_to_ocean() and generate_distance_to_water(). This implementation obscured useful functionality.

The generate_distance_to_ocean() function, reasonably, suggests it can only be used to generate a distance_to_ocean ancillary, which is not true if you look at the code more closely: it could for instance be used to generate ancillaries for any distance-to ancillary such as distance-to-rivers or distance-to-lakes ancillaries if you supplied a relevant ancillary (i.e. if you supplied a rivers or lakes geodataframe instead of a coast geodataframe for the coastline argument).

I have thus (1) renamed the DistanceTo class to DistanceToFeature and merged useful functionality from the previously abstracting functions within generate_miscellaneous_ancillaries.py into this class, and (2) created an additional class called DistanceToClosestFeature, refactoring the code previously used for the generate_distance_to_water() functionality, reflecting how this code could be used much more generally, (3) moved relevant unit tests around and (4) made the documentation more intuitive and demonstrative of possible use cases.

Testing:

Ran tests and they passed OK
Added new tests for the new feature(s)

mo-jbeaver

One minor suggestion added, but overall happy with the changes made.

mo-jbeaver · 2026-05-15T13:24:23Z

+        site_cube: Cube,
+        geometry: GeoDataFrame,
+        exclude_outside_of: Optional[GeoDataFrame] = None,
+        exclusion_buffer: float = 10,


If Only used if exclude_outside_of is provided, should this not be optional too?

Yes. Implemented as suggested.

MoseleyS

I thought I'd try to review something to help out. Not sure my brain is working quite as well as I hoped. Anyway, here are two thoughts to accept or ignore.

MoseleyS · 2026-05-18T18:48:50Z

                Areas projection suitable for the UK.
-            new_name:
-                The name of the output cube.
+            output_name:


While I agree that output_name is more descriptive, there are lots of examples of new_name for this purpose in IMPROVER. I favour consistency here, but feel free to argue back.

Reverted the change, as suggested, and made the required associated changes across files that use this class.

MoseleyS · 2026-05-18T18:52:58Z

        site_coords, geometry_projection = self.project_geometry(geometry, site_cube)

-        if self.clip_geometry_flag:
+        if self._should_clip_geometry:


"Should" sounds optional. I agree that renaming is good, but something imperative might be better - _do_geometry_clipping?

Implemented as suggested.

mo-AliceLake · 2026-05-19T10:16:01Z

+        exclude_outside_of: GeoDataFrame,
+        exclusion_buffer: float,
+    ) -> List[int]:
+        """Apply exclusion geometry logic: set distances to 0 for sites outside the
+        exclusion geometry.


This may just be me, but I find it a bit counterintuitive that sites outside the exclusion geometry are set to 0.

I think exclude_outside_of is a great, clear name - so maybe this is just something that just needs to be a bit clearer in the docstring description? Perhaps something like "set distances to 0 for sites outside of the provided geometry"?

Implemented as suggested.

mo-AliceLake · 2026-05-19T10:18:06Z

+            distance_results:
+                The calculated distances to the feature.
+            exclude_outside_of:
+                A GeoDataFrame containing the exclusion geometry. Sites outside this


Same as above, find "exclusion geometry" a bit counterintuitive. Could we go with something like "geometry defining the valid region" instead of "exclusion geometry"?

Implemented as suggested, and made some small changes throughout the docs to reflect this update.

mo-AliceLake · 2026-05-19T10:58:22Z

+            A cube containing the distance to closest feature ancillary data.
+        """
+        import numpy as np
+


My main comment for this script:

I know you ask for this in the docstring, but for safety, can we include a check here that each cube does contain the same sites (in the same order, etc)? Similarly, as we rely on the first cube in the list to provide us with metadata, can we check that all the metadata matches?

I think we might also need to test that distance_to_features isn't empty.

I have now tested:

metadata compatibility: cubes contain the same sites (incl. in the same order) and other dimension metadata.

distance_to_features isn't empty

mo-AliceLake · 2026-05-19T11:29:16Z

+
+        # Calculate the minimum distance across all features
+        distances_to_features = np.stack([cube.data for cube in distance_to_features])
+        min_distance = np.min(distances_to_features, axis=0)


We could also use distances_to_features.min(axis=0) here since this is a NumPy array, but both are equivalent, so just a style preference. 🙂

Implemented as suggested.

mo-AliceLake · 2026-05-19T11:32:56Z

+        min_distance = np.min(distances_to_features, axis=0)
+
+        # Create a new cube for the distance to closest feature
+        distance_to_closest = distance_to_features[0].copy(data=min_distance)


This makes sense 🙂 might be worth making it explicit (e.g. via a small comment or explicitly creating a template_cube variable) that the first cube is being used as the metadata template here?

Added a small extension to the comment

mo-AliceLake · 2026-05-19T11:36:45Z

+        if output_name:
+            distance_to_closest.rename(output_name)


Is this the behaviour we really want here? I worry it could be confusing to end up with a cube called (for example) distance_to_sea, when it actually represents distance to sea and rivers and lakes, etc., and that traceability has been lost.

Would it be better to default to None/"undefined", or alternatively make output_name a required argument?

Good spot. I have made the argument required across both plugins.

mo-AliceLake · 2026-05-19T11:51:18Z

@@ -0,0 +1,66 @@
+# (C) Crown Copyright, Met Office. All rights reserved.


This is a great start to the unit tests for this script 🙂

A few follow-ups based on earlier comments- it might be worth adding some additional tests to make the intended behaviour a bit more robust:

If we default to None / "undefined" when no output_name is provided, it would be good to include a test for that.

Since we rely on the first cube as a metadata template, it would be good to include a test that an error is raised if the metadata is not consistent across cubes.

It would also be helpful to explicitly check behaviour for an empty CubeList (e.g. raising a clear error).

Finally, since we assume all input cubes contain the same sites, a test ensuring mismatched shapes or ordering raises an error would help avoid silent issues

Implemented tests that cover the above suggestions.

mo-AliceLake

Overall the changes look good, and the code is clear. I found it much easier to follow than before these changes, particularly the explanation of why DistanceToClosestFeature might be needed/used. 🙂

I've left a few smaller comments around docstrings, happy for those to be taken or left as they are.

My main thought is that it's probably worth making some of the behaviour in the DistanceToNearestFeature class a bit more robust by adding explicit checks, as there are a few edge case ways it could silently fail and/or give misleading output.

maxwhitemet

Thank you for your reviews @mo-AliceLake, @mo-jbeaver, @MoseleyS. I have responded to your feedback and implemented all suggestions.

maxwhitemet · 2026-05-26T08:29:30Z

                Areas projection suitable for the UK.
-            new_name:
-                The name of the output cube.
+            output_name:


Reverted the change, as suggested, and made the required associated changes across files that use this class.

maxwhitemet · 2026-05-26T08:33:44Z

+        exclude_outside_of: GeoDataFrame,
+        exclusion_buffer: float,
+    ) -> List[int]:
+        """Apply exclusion geometry logic: set distances to 0 for sites outside the
+        exclusion geometry.


Implemented as suggested.

maxwhitemet · 2026-05-26T08:38:13Z

+            distance_results:
+                The calculated distances to the feature.
+            exclude_outside_of:
+                A GeoDataFrame containing the exclusion geometry. Sites outside this


Implemented as suggested, and made some small changes throughout the docs to reflect this update.

maxwhitemet · 2026-05-26T08:39:17Z

+        site_cube: Cube,
+        geometry: GeoDataFrame,
+        exclude_outside_of: Optional[GeoDataFrame] = None,
+        exclusion_buffer: float = 10,


Yes. Implemented as suggested.

maxwhitemet · 2026-05-26T08:42:47Z

        site_coords, geometry_projection = self.project_geometry(geometry, site_cube)

-        if self.clip_geometry_flag:
+        if self._should_clip_geometry:


Implemented as suggested.

maxwhitemet · 2026-05-26T12:47:13Z

+            A cube containing the distance to closest feature ancillary data.
+        """
+        import numpy as np
+


I have now tested:

metadata compatibility: cubes contain the same sites (incl. in the same order) and other dimension metadata.

distance_to_features isn't empty

maxwhitemet · 2026-05-26T13:02:40Z

+
+        # Calculate the minimum distance across all features
+        distances_to_features = np.stack([cube.data for cube in distance_to_features])
+        min_distance = np.min(distances_to_features, axis=0)


Implemented as suggested.

maxwhitemet · 2026-05-26T13:20:27Z

+        min_distance = np.min(distances_to_features, axis=0)
+
+        # Create a new cube for the distance to closest feature
+        distance_to_closest = distance_to_features[0].copy(data=min_distance)


Added a small extension to the comment

maxwhitemet · 2026-05-26T13:59:25Z

+        if output_name:
+            distance_to_closest.rename(output_name)


Good spot. I have made the argument required across both plugins.

maxwhitemet · 2026-05-26T14:12:50Z

@@ -0,0 +1,66 @@
+# (C) Crown Copyright, Met Office. All rights reserved.


Implemented tests that cover the above suggestions.

mo-AliceLake

Looks great, can see all comments have been addressed - thanks Max. 🙂

mo-jbeaver

Happy with the updates made and the tests passed successfully.

maxwhitemet added 2 commits May 13, 2026 16:52

Changes to how we do distance_to calculations

b1c817f

Fix incorrect test arg name following refactoring

b41136e

maxwhitemet force-pushed the distance_to_modifications branch from 1d332cf to b41136e Compare May 13, 2026 15:54

mo-jbeaver requested changes May 18, 2026

View reviewed changes

MoseleyS reviewed May 18, 2026

View reviewed changes

mo-AliceLake reviewed May 19, 2026

View reviewed changes

Comment thread improver/generate_ancillaries/generate_distance_to_feature.py

mo-AliceLake reviewed May 19, 2026

View reviewed changes

Comment thread improver/generate_ancillaries/generate_distance_to_feature.py

mo-AliceLake reviewed May 19, 2026

View reviewed changes

mo-AliceLake requested changes May 19, 2026

View reviewed changes

maxwhitemet added 3 commits May 26, 2026 09:12

Merge branch 'master' into distance_to_modifications

b6792a8

Modifications following review

a96c391

Fix test failures by adding required positional argument

72e1d4f

maxwhitemet commented May 27, 2026

View reviewed changes

maxwhitemet requested a review from mo-AliceLake May 27, 2026 17:00

mo-AliceLake approved these changes May 28, 2026

View reviewed changes

maxwhitemet requested a review from mo-jbeaver May 28, 2026 09:47

mo-jbeaver approved these changes May 28, 2026

View reviewed changes

Merge branch 'master' into distance_to_modifications

067185c

maxwhitemet merged commit 01e222c into metoppv:master May 28, 2026
11 of 13 checks passed

		@@ -0,0 +1,66 @@
		# (C) Crown Copyright, Met Office. All rights reserved.

Conversation

maxwhitemet commented May 13, 2026

Uh oh!

mo-jbeaver left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MoseleyS left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mo-AliceLake May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mo-AliceLake May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mo-AliceLake left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxwhitemet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mo-AliceLake May 19, 2026 •

edited

Loading

mo-AliceLake May 19, 2026 •

edited

Loading

mo-AliceLake left a comment •

edited

Loading