Xarray dumpers / loaders#16
Conversation
| # If values are length 1 we are dealing with coords like date | ||
| v = v.values | ||
| try: | ||
| node_dict["coords"][k] = v |
There was a problem hiding this comment.
| node_dict["coords"][k] = v | |
| node_dict["coords"][k] = {"data": v, "dims": k} |
This would be consistent with the lines below. Not sure if dims is necessary, but I think so?
|
|
||
|
|
||
| def _create_node(model, schemaview): | ||
| """Create datatree from temperature dataset""" |
There was a problem hiding this comment.
Add comment with pointer to the data model of a Dataset's attrs/coords/data_vars/dims.
| node_dict["coords"][k].update({"attrs": {key: value}}) | ||
| else: | ||
| # conversion factor | ||
| node_dict["data_vars"] = {k: {"attrs": {key: value}}} |
There was a problem hiding this comment.
What if there is more than one? Here the attrs dict is set to a dictionary with just this kv pair. I think this should call _create_node recursively.
| if isinstance(v, BaseModel): | ||
| # create a subgroup and recurse | ||
| if "values" in v.__dir__(): | ||
| dims = ["y","x"] |
There was a problem hiding this comment.
TODO: see if we can get this from the schemaview.
| return output_file_path | ||
|
|
||
|
|
||
| class YamlXarrayZarrDumper(YamlArrayFileDumper): |
There was a problem hiding this comment.
I am sorta having a tough time telling whats going on in this PR (sorry for missing the meeting, I never know when they are), but im doing this rn and u will definitely want to have a model for this, otherwise all the ser/des logic gets packed into a huge nesty dumper/loader and u lose the declarative relationship between in memory and dumped.
|
As just a general comment, when working with linkml models which can contain a ton of references between objects, and we cant/shouldnt assume hierarchical structure, I have found that recursive patterns become sort of a liability. Its one reason I restructured nwb-linkml stuff to be a sort of bidirectional passing of results objects up and down system for model generation, bc the decision about what to do at any given node is dependent on at least the parent/child ( I see some switches in here for deciding whether or not we're a dataset or an attr, for example, and also some comments about "what if there is more than one of this"). For loading/dumping, I think u will probably want to structure it like a graph that models dependencies between the nodes. My still-pretty-crappy-and-repetitive version of this for nwb is over here: https://github.com/p2p-ld/nwb-linkml/blob/main/nwb_linkml/src/nwb_linkml/io/hdf5.py |
|
also maybe relevant in general im about 2 do some stuff like dis class MyModel(BaseModel):
array: NDArraymodel = MyModel(array="data/test.avi")
print_json(model.model_dump_json(round_trip=True)){"array": {"type": "video", "file": "data/test.avi"}}model.model_dump_json(round_trip=True){
"array": {
"type": "dask",
"name": "array-2a39187fc9fcee3f4cdbc1f2911b4b92",
"chunks": [[2], [2]],
"dtype": "int64",
"array": [[1, 2], [3, 4]]
}
} model.model_dump_json(
round_trip=True,
context={"mark_interface": True}
)){
"array": {
"interface": {
"module": "numpydantic.interface.hdf5",
"cls": "H5Interface",
"version": "1.5.3"
},
"value": {
"type": "hdf5",
"file": "data/test.h5",
"path": "/data",
"field": null
}
}
}and so on. and da structure of that is all controllable by the interface class so ya can make your own serialization no sweat eg: https://github.com/p2p-ld/numpydantic/blob/135b74aa2e4d6e5a755c27d3bb7fe170650058d3/src/numpydantic/interface/hdf5.py#L80-L91 |
|
Sorry for the late comment but you are absolutely right @sneakers-the-rat, when implementing it this way I indeed ran into the problem that the recursiveness makes it difficult to to traverse up and down the graph if required, where a single level within the schema view might not give the required information on its own how to exactly deal with the incoming information. |
|
@rly I think this would be good to discuss this evening |
Adding xarray dumpers and loaders