Skip to content

Redesigned dataset_compliance w/ standard names validation#373

Open
sadielbartholomew wants to merge 185 commits into
NCAS-CMS:mainfrom
sadielbartholomew:validate-standard-names
Open

Redesigned dataset_compliance w/ standard names validation#373
sadielbartholomew wants to merge 185 commits into
NCAS-CMS:mainfrom
sadielbartholomew:validate-standard-names

Conversation

@sadielbartholomew

@sadielbartholomew sadielbartholomew commented Dec 19, 2025

Copy link
Copy Markdown
Member

Close #366 by setting up discussed data structure to close #365, reporting invalid standard names in the new output structure, as indicated in #365 (comment).

Is quite a hefty PR with a tragic amount of commits, so happy to squash down the first ~50-100 of these, which were mostly development (and/or investigative behaviour) commits which were incrementally updated as we revised our idea for the Conformance Data Model (see UML diagram in #365 (comment)).

Some minor follow on work when we have time to restart conformance work is to:

Outstanding questions

Aspects I am unsure about / questions:

  • cell methods and how to report about issues on those;
  • whether the Data Model should have 1..* NonConformanceCase for AttributeNonConformance as per our UML - I think in practice the non-conformance could be further down the chain, not a direct association - so I think this should be 0..* and that is what this PR code assumes (does that make sense?).

Review guidance

Structure of new conformance module

UML diagrams generated with pyreverse, though note they only include the conformance module separate to the whole cfdm module, so don't pick up on external connections notably to all dataset reading logic especially NetCDFRead. But could be a useful overview:

Packages

packages_conf-final-conformancedir

Classes

classes_conf-final-conformancedir

Notes on PR and approach

  1. As discussed in person during development, the new conformance checking logic is implemented using a new submodule conformance which is based on a Conformance Data Model.
  2. Towards separation of concerns, I have moved all _check_* and _ugrid_check_* method from netcdfread to the new dedicated submodule conformance.checker.
  3. And any reporting-related functionality is in conformance.reporting. as_report_fragment is the ultimate main method from the datamodel module to note for dataset_compliance - it generates a dict by recursively operating on all relevant *NonConformance objects with the same method defined, to generate a structure from all of the dict fragments resulting in the possibly (heavily-)nested output.

Advice on how to review

  • Best review the code changes as a whole (not on an individual commit basis - there are too many and a bit of a mess due to the moving nature of development goals, sorry!), though note the below regarding reviewing conformance.checker.
  • Given (2) above, I realised on later merge conflict resolution that it would be difficult to see what changes I made to the _check_* and _ugrid_check_* methods, which is just to add _check_standard_name and _include_component_report calls in the right places. To make reviewing easier I copied the main post-merge state of those methods in netcdfread and then made any changes to those once moved in 33786f5, with some further additions necessary for tweaks and fixes, so please run git diff 0a3be736b85cebea58587844cc887beff9cfc497 checker.py to see and review all updates to the checking methods previously living in netcdfread.

Representative outputs

As per the new test module test_compliance_checking.py, we test on a non-UGRID 'kitchen sink' and a UGRID field with the expected outputs as follows, abiding by the Conformance Data Model:

Kitchen sink non-UGRID field

{'CF version': '1.13',
 'ta': {'attributes': {'ancillary_variables': {'value': 'air_temperature_standard_error',
                                               'variables': {'air_temperature_standard_error': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                      'reason': 'standard_name '
                                                                                                                                                                'attribute '
                                                                                                                                                                'has '
                                                                                                                                                                'a '
                                                                                                                                                                'value '
                                                                                                                                                                'that '
                                                                                                                                                                'is '
                                                                                                                                                                'not '
                                                                                                                                                                'a '
                                                                                                                                                                'valid '
                                                                                                                                                                'name '
                                                                                                                                                                'contained '
                                                                                                                                                                'in '
                                                                                                                                                                'the '
                                                                                                                                                                'current '
                                                                                                                                                                'standard '
                                                                                                                                                                'name '
                                                                                                                                                                'table'}],
                                                                                                                                 'value': 'badname_air_temperature_standard_error'}}}}},
                       'cell_measures': {'value': 'cell_measure',
                                         'variables': {'cell_measure': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                              'reason': 'standard_name '
                                                                                                                                        'attribute '
                                                                                                                                        'has '
                                                                                                                                        'a '
                                                                                                                                        'value '
                                                                                                                                        'that '
                                                                                                                                        'is '
                                                                                                                                        'not '
                                                                                                                                        'a '
                                                                                                                                        'valid '
                                                                                                                                        'name '
                                                                                                                                        'contained '
                                                                                                                                        'in '
                                                                                                                                        'the '
                                                                                                                                        'current '
                                                                                                                                        'standard '
                                                                                                                                        'name '
                                                                                                                                        'table'}],
                                                                                                         'value': 'badname_cell_measure'}}}}},
                       'coordinates': {'value': 'time',
                                       'variables': {'auxiliary': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                         'reason': 'standard_name '
                                                                                                                                   'attribute '
                                                                                                                                   'has '
                                                                                                                                   'a '
                                                                                                                                   'value '
                                                                                                                                   'that '
                                                                                                                                   'is '
                                                                                                                                   'not '
                                                                                                                                   'a '
                                                                                                                                   'valid '
                                                                                                                                   'name '
                                                                                                                                   'contained '
                                                                                                                                   'in '
                                                                                                                                   'the '
                                                                                                                                   'current '
                                                                                                                                   'standard '
                                                                                                                                   'name '
                                                                                                                                   'table'}],
                                                                                                    'value': 'badname_auxiliary'}}},
                                                     'latitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                          'reason': 'standard_name '
                                                                                                                                    'attribute '
                                                                                                                                    'has '
                                                                                                                                    'a '
                                                                                                                                    'value '
                                                                                                                                    'that '
                                                                                                                                    'is '
                                                                                                                                    'not '
                                                                                                                                    'a '
                                                                                                                                    'valid '
                                                                                                                                    'name '
                                                                                                                                    'contained '
                                                                                                                                    'in '
                                                                                                                                    'the '
                                                                                                                                    'current '
                                                                                                                                    'standard '
                                                                                                                                    'name '
                                                                                                                                    'table'}],
                                                                                                     'value': 'badname_latitude_1'}}},
                                                     'longitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                           'reason': 'standard_name '
                                                                                                                                     'attribute '
                                                                                                                                     'has '
                                                                                                                                     'a '
                                                                                                                                     'value '
                                                                                                                                     'that '
                                                                                                                                     'is '
                                                                                                                                     'not '
                                                                                                                                     'a '
                                                                                                                                     'valid '
                                                                                                                                     'name '
                                                                                                                                     'contained '
                                                                                                                                     'in '
                                                                                                                                     'the '
                                                                                                                                     'current '
                                                                                                                                     'standard '
                                                                                                                                     'name '
                                                                                                                                     'table'}],
                                                                                                      'value': 'badname_longitude_1'}}},
                                                     'time': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                    'reason': 'standard_name '
                                                                                                                              'attribute '
                                                                                                                              'has '
                                                                                                                              'a '
                                                                                                                              'value '
                                                                                                                              'that '
                                                                                                                              'is '
                                                                                                                              'not '
                                                                                                                              'a '
                                                                                                                              'valid '
                                                                                                                              'name '
                                                                                                                              'contained '
                                                                                                                              'in '
                                                                                                                              'the '
                                                                                                                              'current '
                                                                                                                              'standard '
                                                                                                                              'name '
                                                                                                                              'table'}],
                                                                                               'value': 'badname_time'}}}}},
                       'standard_name': {'non-conformance': [{'code': 400022,
                                                              'reason': 'standard_name '
                                                                        'attribute '
                                                                        'has a '
                                                                        'value '
                                                                        'that '
                                                                        'is '
                                                                        'not a '
                                                                        'valid '
                                                                        'name '
                                                                        'contained '
                                                                        'in '
                                                                        'the '
                                                                        'current '
                                                                        'standard '
                                                                        'name '
                                                                        'table'}],
                                         'value': 'badname_ta'}}}}

UGRID field

{'CF version': '1.13',
 'pa': {'attributes': {'mesh': {'value': 'Mesh2',
                                'variables': {'Mesh2': {'attributes': {'edge_node_connectivity': {'value': 'Mesh2_edge_nodes',
                                                                                                  'variables': {'Mesh2_edge_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_edge_nodes'}}}}},
                                                                       'face_face_connectivity': {'value': 'Mesh2_face_links',
                                                                                                  'variables': {'Mesh2_face_links': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_face_links'}}}}},
                                                                       'face_node_connectivity': {'value': 'Mesh2_face_nodes',
                                                                                                  'variables': {'Mesh2_face_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_face_nodes'}}}}},
                                                                       'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                              'reason': 'standard_name '
                                                                                                                        'attribute '
                                                                                                                        'has '
                                                                                                                        'a '
                                                                                                                        'value '
                                                                                                                        'that '
                                                                                                                        'is '
                                                                                                                        'not '
                                                                                                                        'a '
                                                                                                                        'valid '
                                                                                                                        'name '
                                                                                                                        'contained '
                                                                                                                        'in '
                                                                                                                        'the '
                                                                                                                        'current '
                                                                                                                        'standard '
                                                                                                                        'name '
                                                                                                                        'table'}],
                                                                                         'value': 'badname_Mesh2'}}}}},
                       'standard_name': {'non-conformance': [{'code': 400022,
                                                              'reason': 'standard_name '
                                                                        'attribute '
                                                                        'has a '
                                                                        'value '
                                                                        'that '
                                                                        'is '
                                                                        'not a '
                                                                        'valid '
                                                                        'name '
                                                                        'contained '
                                                                        'in '
                                                                        'the '
                                                                        'current '
                                                                        'standard '
                                                                        'name '
                                                                        'table'}],
                                         'value': 'badname_air_pressure'}}}}

@sadielbartholomew

Copy link
Copy Markdown
Member Author

Now ready for your re-review, thanks @davidhassell.

@sadielbartholomew

Copy link
Copy Markdown
Member Author

Now updated after our off-line discussions relating to #404 (UGRID writing) - all resolved and tests now updated accordingly in a nice way. @davidhassell ready for re-review, thanks.

@davidhassell davidhassell left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - a second pass. The structure look good, and a couple of things I noticed. However, the some UGRID tests in test_UGRID.py are failing which shouldn't have anything to do with compliance checking .. do you see that?

test_UGRID_data (__main__.UGRIDTest.test_UGRID_data)
Test reading of UGRID data. ... ok
test_UGRID_read (__main__.UGRIDTest.test_UGRID_read)
Test reading of UGRID files. ... ok
test_read_UGRID_domain (__main__.UGRIDTest.test_read_UGRID_domain)
Test reading of UGRID files into domains. ... ok
test_read_write_UGRID_domain (__main__.UGRIDTest.test_read_write_UGRID_domain)
Test the cfdm.read and cfdm.write with UGRID domains. ... FAIL
test_read_write_UGRID_field (__main__.UGRIDTest.test_read_write_UGRID_field)
Test the cfdm.read and cfdm.write with UGRID fields. ... FAIL

======================================================================
FAIL: test_read_write_UGRID_domain (__main__.UGRIDTest.test_read_write_UGRID_domain)
Test the cfdm.read and cfdm.write with UGRID domains.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/david/cfdm/cfdm/test/test_UGRID.py", line 253, in test_read_write_UGRID_domain
    self.assertTrue(e[0].equals(d))
AssertionError: False is not true

======================================================================
FAIL: test_read_write_UGRID_field (__main__.UGRIDTest.test_read_write_UGRID_field)
Test the cfdm.read and cfdm.write with UGRID fields.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/david/cfdm/cfdm/test/test_UGRID.py", line 214, in test_read_write_UGRID_field
    self.assertTrue(g[0].equals(f))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 5 tests in 2.946s

FAILED (failures=2)

# ------------------------------------------------------------
return out

def _check_valid(self, field, construct):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method should be removed. Despite it's name (_check_*), it is not a compliance checking method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good spot - I didn't actually intend to remove that, I think that was accidentally lost during the move around of _check_* methods so meant to appear under read_write/netcdf/checker.py in the 'new way' of this branch.

I'll add it back in at its original location in this module - but if it isn't a compliance checking method, to avoid confusion, can we please rename it? Maybe warn_valid_min_max_range or similar - anything avoiding the _check_* format that is sensible would be fine by me?

Comment thread cfdm/read_write/netcdf/checker.py Outdated
from ...conformance.reporting import Report


class NetCDFChecker(Report):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make it clear that this is a Mixin class (like we do elsewhere), and not that in the dosctring.

Suggested change
class NetCDFChecker(Report):
class NetCDFCheckerMixin(Report):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good idea - done in 9f2fb67.

@sadielbartholomew

Copy link
Copy Markdown
Member Author

Hi @davidhassell, thanks again for your latest review. I've updated the PR accordingly and now it is ready for re-review.

Note regarding the UGRID test failures, after reviewing my conflict resolution and tidy after the latest iteration I realised that I'd messed up a little and missed a change from your PR #372 which had caused those failures, namely I had somehow in wrangling the updates across the long dev period I'd missed out: https://github.com/NCAS-CMS/cfdm/pull/372/changes#diff-4f825020829281f118affea51eb2cd2bb4aeebc3fe21cd71b3ed51fbba1cfd9cR10404 which I put back in d186780. I also went back to check I'd captured everything after that in case I'd missed something else.

All other feedback should be addressed too as per my replies to your comments in-line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conformance enhancement New feature or request UGRID Relating to UGRID mesh topologies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compliance reporting: flag any invalid standard names Output for Field.dataset_compliance towards a CF Checker

2 participants