Redesigned dataset_compliance w/ standard names validation#373
Redesigned dataset_compliance w/ standard names validation#373sadielbartholomew wants to merge 185 commits into
dataset_compliance w/ standard names validation#373Conversation
This reverts commit 93cda1b. See: https://github.com/NCAS-CMS/cfdm/pull/373/changes#r3209453587 for justification.
|
Now ready for your re-review, thanks @davidhassell. |
|
Now updated after our off-line discussions relating to #404 (UGRID writing) - all resolved and tests now updated accordingly in a nice way. @davidhassell ready for re-review, thanks. |
davidhassell
left a comment
There was a problem hiding this comment.
Hi - a second pass. The structure look good, and a couple of things I noticed. However, the some UGRID tests in test_UGRID.py are failing which shouldn't have anything to do with compliance checking .. do you see that?
test_UGRID_data (__main__.UGRIDTest.test_UGRID_data)
Test reading of UGRID data. ... ok
test_UGRID_read (__main__.UGRIDTest.test_UGRID_read)
Test reading of UGRID files. ... ok
test_read_UGRID_domain (__main__.UGRIDTest.test_read_UGRID_domain)
Test reading of UGRID files into domains. ... ok
test_read_write_UGRID_domain (__main__.UGRIDTest.test_read_write_UGRID_domain)
Test the cfdm.read and cfdm.write with UGRID domains. ... FAIL
test_read_write_UGRID_field (__main__.UGRIDTest.test_read_write_UGRID_field)
Test the cfdm.read and cfdm.write with UGRID fields. ... FAIL
======================================================================
FAIL: test_read_write_UGRID_domain (__main__.UGRIDTest.test_read_write_UGRID_domain)
Test the cfdm.read and cfdm.write with UGRID domains.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/david/cfdm/cfdm/test/test_UGRID.py", line 253, in test_read_write_UGRID_domain
self.assertTrue(e[0].equals(d))
AssertionError: False is not true
======================================================================
FAIL: test_read_write_UGRID_field (__main__.UGRIDTest.test_read_write_UGRID_field)
Test the cfdm.read and cfdm.write with UGRID fields.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/david/cfdm/cfdm/test/test_UGRID.py", line 214, in test_read_write_UGRID_field
self.assertTrue(g[0].equals(f))
AssertionError: False is not true
----------------------------------------------------------------------
Ran 5 tests in 2.946s
FAILED (failures=2)| # ------------------------------------------------------------ | ||
| return out | ||
|
|
||
| def _check_valid(self, field, construct): |
There was a problem hiding this comment.
I don't think this method should be removed. Despite it's name (_check_*), it is not a compliance checking method.
There was a problem hiding this comment.
Ah, good spot - I didn't actually intend to remove that, I think that was accidentally lost during the move around of _check_* methods so meant to appear under read_write/netcdf/checker.py in the 'new way' of this branch.
I'll add it back in at its original location in this module - but if it isn't a compliance checking method, to avoid confusion, can we please rename it? Maybe warn_valid_min_max_range or similar - anything avoiding the _check_* format that is sensible would be fine by me?
| from ...conformance.reporting import Report | ||
|
|
||
|
|
||
| class NetCDFChecker(Report): |
There was a problem hiding this comment.
Perhaps make it clear that this is a Mixin class (like we do elsewhere), and not that in the dosctring.
| class NetCDFChecker(Report): | |
| class NetCDFCheckerMixin(Report): |
|
Hi @davidhassell, thanks again for your latest review. I've updated the PR accordingly and now it is ready for re-review. Note regarding the UGRID test failures, after reviewing my conflict resolution and tidy after the latest iteration I realised that I'd messed up a little and missed a change from your PR #372 which had caused those failures, namely I had somehow in wrangling the updates across the long dev period I'd missed out: https://github.com/NCAS-CMS/cfdm/pull/372/changes#diff-4f825020829281f118affea51eb2cd2bb4aeebc3fe21cd71b3ed51fbba1cfd9cR10404 which I put back in d186780. I also went back to check I'd captured everything after that in case I'd missed something else. All other feedback should be addressed too as per my replies to your comments in-line. |
Close #366 by setting up discussed data structure to close #365, reporting invalid standard names in the new output structure, as indicated in #365 (comment).
Is quite a hefty PR with a tragic amount of commits, so happy to squash down the first ~50-100 of these, which were mostly development (and/or investigative behaviour) commits which were incrementally updated as we revised our idea for the Conformance Data Model (see UML diagram in #365 (comment)).
Some minor follow on work when we have time to restart conformance work is to:
Outstanding questions
Aspects I am unsure about / questions:
1..*NonConformanceCaseforAttributeNonConformanceas per our UML - I think in practice the non-conformance could be further down the chain, not a direct association - so I think this should be0..*and that is what this PR code assumes (does that make sense?).Review guidance
Structure of new
conformancemoduleUML diagrams generated with
pyreverse, though note they only include theconformancemodule separate to the wholecfdmmodule, so don't pick up on external connections notably to all dataset reading logic especiallyNetCDFRead. But could be a useful overview:Packages
Classes
Notes on PR and approach
conformancewhich is based on a Conformance Data Model._check_*and_ugrid_check_*method fromnetcdfreadto the new dedicated submoduleconformance.checker.conformance.reporting.as_report_fragmentis the ultimate main method from thedatamodelmodule to note fordataset_compliance- it generates adictby recursively operating on all relevant*NonConformanceobjects with the same method defined, to generate a structure from all of the dict fragments resulting in the possibly (heavily-)nested output.Advice on how to review
conformance.checker._check_*and_ugrid_check_*methods, which is just to add_check_standard_nameand_include_component_reportcalls in the right places. To make reviewing easier I copied themainpost-merge state of those methods innetcdfreadand then made any changes to those once moved in 33786f5, with some further additions necessary for tweaks and fixes, so please rungit diff 0a3be736b85cebea58587844cc887beff9cfc497 checker.pyto see and review all updates to the checking methods previously living innetcdfread.Representative outputs
As per the new test module
test_compliance_checking.py, we test on a non-UGRID 'kitchen sink' and a UGRID field with the expected outputs as follows, abiding by the Conformance Data Model:Kitchen sink non-UGRID field
{'CF version': '1.13', 'ta': {'attributes': {'ancillary_variables': {'value': 'air_temperature_standard_error', 'variables': {'air_temperature_standard_error': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_air_temperature_standard_error'}}}}}, 'cell_measures': {'value': 'cell_measure', 'variables': {'cell_measure': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_cell_measure'}}}}}, 'coordinates': {'value': 'time', 'variables': {'auxiliary': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_auxiliary'}}}, 'latitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_latitude_1'}}}, 'longitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_longitude_1'}}}, 'time': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_time'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has a ' 'value ' 'that ' 'is ' 'not a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_ta'}}}}UGRID field
{'CF version': '1.13', 'pa': {'attributes': {'mesh': {'value': 'Mesh2', 'variables': {'Mesh2': {'attributes': {'edge_node_connectivity': {'value': 'Mesh2_edge_nodes', 'variables': {'Mesh2_edge_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_edge_nodes'}}}}}, 'face_face_connectivity': {'value': 'Mesh2_face_links', 'variables': {'Mesh2_face_links': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_face_links'}}}}}, 'face_node_connectivity': {'value': 'Mesh2_face_nodes', 'variables': {'Mesh2_face_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_face_nodes'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has a ' 'value ' 'that ' 'is ' 'not a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_air_pressure'}}}}