Changes to how we do distance_to calculations#2379
Conversation
1d332cf to
b41136e
Compare
mo-jbeaver
left a comment
There was a problem hiding this comment.
One minor suggestion added, but overall happy with the changes made.
| site_cube: Cube, | ||
| geometry: GeoDataFrame, | ||
| exclude_outside_of: Optional[GeoDataFrame] = None, | ||
| exclusion_buffer: float = 10, |
There was a problem hiding this comment.
If Only used if exclude_outside_of is provided, should this not be optional too?
There was a problem hiding this comment.
Yes. Implemented as suggested.
MoseleyS
left a comment
There was a problem hiding this comment.
I thought I'd try to review something to help out. Not sure my brain is working quite as well as I hoped. Anyway, here are two thoughts to accept or ignore.
| Areas projection suitable for the UK. | ||
| new_name: | ||
| The name of the output cube. | ||
| output_name: |
There was a problem hiding this comment.
While I agree that output_name is more descriptive, there are lots of examples of new_name for this purpose in IMPROVER. I favour consistency here, but feel free to argue back.
There was a problem hiding this comment.
Reverted the change, as suggested, and made the required associated changes across files that use this class.
| site_coords, geometry_projection = self.project_geometry(geometry, site_cube) | ||
|
|
||
| if self.clip_geometry_flag: | ||
| if self._should_clip_geometry: |
There was a problem hiding this comment.
"Should" sounds optional. I agree that renaming is good, but something imperative might be better - _do_geometry_clipping?
There was a problem hiding this comment.
Implemented as suggested.
| exclude_outside_of: GeoDataFrame, | ||
| exclusion_buffer: float, | ||
| ) -> List[int]: | ||
| """Apply exclusion geometry logic: set distances to 0 for sites outside the | ||
| exclusion geometry. |
There was a problem hiding this comment.
This may just be me, but I find it a bit counterintuitive that sites outside the exclusion geometry are set to 0.
I think exclude_outside_of is a great, clear name - so maybe this is just something that just needs to be a bit clearer in the docstring description? Perhaps something like "set distances to 0 for sites outside of the provided geometry"?
There was a problem hiding this comment.
Implemented as suggested.
| distance_results: | ||
| The calculated distances to the feature. | ||
| exclude_outside_of: | ||
| A GeoDataFrame containing the exclusion geometry. Sites outside this |
There was a problem hiding this comment.
Same as above, find "exclusion geometry" a bit counterintuitive. Could we go with something like "geometry defining the valid region" instead of "exclusion geometry"?
There was a problem hiding this comment.
Implemented as suggested, and made some small changes throughout the docs to reflect this update.
| A cube containing the distance to closest feature ancillary data. | ||
| """ | ||
| import numpy as np | ||
|
|
There was a problem hiding this comment.
My main comment for this script:
I know you ask for this in the docstring, but for safety, can we include a check here that each cube does contain the same sites (in the same order, etc)? Similarly, as we rely on the first cube in the list to provide us with metadata, can we check that all the metadata matches?
I think we might also need to test that distance_to_features isn't empty.
There was a problem hiding this comment.
I have now tested:
- metadata compatibility: cubes contain the same sites (incl. in the same order) and other dimension metadata.
distance_to_featuresisn't empty
|
|
||
| # Calculate the minimum distance across all features | ||
| distances_to_features = np.stack([cube.data for cube in distance_to_features]) | ||
| min_distance = np.min(distances_to_features, axis=0) |
There was a problem hiding this comment.
We could also use distances_to_features.min(axis=0) here since this is a NumPy array, but both are equivalent, so just a style preference. 🙂
There was a problem hiding this comment.
Implemented as suggested.
| min_distance = np.min(distances_to_features, axis=0) | ||
|
|
||
| # Create a new cube for the distance to closest feature | ||
| distance_to_closest = distance_to_features[0].copy(data=min_distance) |
There was a problem hiding this comment.
This makes sense 🙂 might be worth making it explicit (e.g. via a small comment or explicitly creating a template_cube variable) that the first cube is being used as the metadata template here?
There was a problem hiding this comment.
Added a small extension to the comment
| if output_name: | ||
| distance_to_closest.rename(output_name) |
There was a problem hiding this comment.
Is this the behaviour we really want here? I worry it could be confusing to end up with a cube called (for example) distance_to_sea, when it actually represents distance to sea and rivers and lakes, etc., and that traceability has been lost.
Would it be better to default to None/"undefined", or alternatively make output_name a required argument?
There was a problem hiding this comment.
Good spot. I have made the argument required across both plugins.
| @@ -0,0 +1,66 @@ | |||
| # (C) Crown Copyright, Met Office. All rights reserved. | |||
There was a problem hiding this comment.
This is a great start to the unit tests for this script 🙂
A few follow-ups based on earlier comments- it might be worth adding some additional tests to make the intended behaviour a bit more robust:
- If we default to
None/ "undefined" when nooutput_nameis provided, it would be good to include a test for that. - Since we rely on the first cube as a metadata template, it would be good to include a test that an error is raised if the metadata is not consistent across cubes.
- It would also be helpful to explicitly check behaviour for an empty
CubeList(e.g. raising a clear error). - Finally, since we assume all input cubes contain the same sites, a test ensuring mismatched shapes or ordering raises an error would help avoid silent issues
There was a problem hiding this comment.
Implemented tests that cover the above suggestions.
There was a problem hiding this comment.
Overall the changes look good, and the code is clear. I found it much easier to follow than before these changes, particularly the explanation of why DistanceToClosestFeature might be needed/used. 🙂
I've left a few smaller comments around docstrings, happy for those to be taken or left as they are.
My main thought is that it's probably worth making some of the behaviour in the DistanceToNearestFeature class a bit more robust by adding explicit checks, as there are a few edge case ways it could silently fail and/or give misleading output.
maxwhitemet
left a comment
There was a problem hiding this comment.
Thank you for your reviews @mo-AliceLake, @mo-jbeaver, @MoseleyS. I have responded to your feedback and implemented all suggestions.
| Areas projection suitable for the UK. | ||
| new_name: | ||
| The name of the output cube. | ||
| output_name: |
There was a problem hiding this comment.
Reverted the change, as suggested, and made the required associated changes across files that use this class.
| exclude_outside_of: GeoDataFrame, | ||
| exclusion_buffer: float, | ||
| ) -> List[int]: | ||
| """Apply exclusion geometry logic: set distances to 0 for sites outside the | ||
| exclusion geometry. |
There was a problem hiding this comment.
Implemented as suggested.
| distance_results: | ||
| The calculated distances to the feature. | ||
| exclude_outside_of: | ||
| A GeoDataFrame containing the exclusion geometry. Sites outside this |
There was a problem hiding this comment.
Implemented as suggested, and made some small changes throughout the docs to reflect this update.
| site_cube: Cube, | ||
| geometry: GeoDataFrame, | ||
| exclude_outside_of: Optional[GeoDataFrame] = None, | ||
| exclusion_buffer: float = 10, |
There was a problem hiding this comment.
Yes. Implemented as suggested.
| site_coords, geometry_projection = self.project_geometry(geometry, site_cube) | ||
|
|
||
| if self.clip_geometry_flag: | ||
| if self._should_clip_geometry: |
There was a problem hiding this comment.
Implemented as suggested.
| A cube containing the distance to closest feature ancillary data. | ||
| """ | ||
| import numpy as np | ||
|
|
There was a problem hiding this comment.
I have now tested:
- metadata compatibility: cubes contain the same sites (incl. in the same order) and other dimension metadata.
distance_to_featuresisn't empty
|
|
||
| # Calculate the minimum distance across all features | ||
| distances_to_features = np.stack([cube.data for cube in distance_to_features]) | ||
| min_distance = np.min(distances_to_features, axis=0) |
There was a problem hiding this comment.
Implemented as suggested.
| min_distance = np.min(distances_to_features, axis=0) | ||
|
|
||
| # Create a new cube for the distance to closest feature | ||
| distance_to_closest = distance_to_features[0].copy(data=min_distance) |
There was a problem hiding this comment.
Added a small extension to the comment
| if output_name: | ||
| distance_to_closest.rename(output_name) |
There was a problem hiding this comment.
Good spot. I have made the argument required across both plugins.
| @@ -0,0 +1,66 @@ | |||
| # (C) Crown Copyright, Met Office. All rights reserved. | |||
There was a problem hiding this comment.
Implemented tests that cover the above suggestions.
mo-AliceLake
left a comment
There was a problem hiding this comment.
Looks great, can see all comments have been addressed - thanks Max. 🙂
mo-jbeaver
left a comment
There was a problem hiding this comment.
Happy with the updates made and the tests passed successfully.
This PR refactors a lot of the original functionality used to create 'distance to' ancillaries (distance to ocean, rivers, lakes).
Previously,
improver/generate_ancillaries/generate_distance_to_feature.py defined 1 class (
DistanceTo), which was called by two improver/generate_ancillaries/generate_miscellaneous_ancillaries.py functions to make specific ancillaries:generate_distance_to_ocean()andgenerate_distance_to_water(). This implementation obscured useful functionality.The
generate_distance_to_ocean()function, reasonably, suggests it can only be used to generate a distance_to_ocean ancillary, which is not true if you look at the code more closely: it could for instance be used to generate ancillaries for any distance-to ancillary such as distance-to-rivers or distance-to-lakes ancillaries if you supplied a relevant ancillary (i.e. if you supplied a rivers or lakes geodataframe instead of a coast geodataframe for thecoastlineargument).I have thus (1) renamed the
DistanceToclass toDistanceToFeatureand merged useful functionality from the previously abstracting functions withingenerate_miscellaneous_ancillaries.pyinto this class, and (2) created an additional class calledDistanceToClosestFeature, refactoring the code previously used for thegenerate_distance_to_water()functionality, reflecting how this code could be used much more generally, (3) moved relevant unit tests around and (4) made the documentation more intuitive and demonstrative of possible use cases.Testing: