Add a schema registry#264
Conversation
…indings packages.
|
I do not understand why some external modules appears in committed json files ? |
|
I will explain it better on the meeting tomorrow. It's a temporary quite ugly fix I added to be able to register the schemas for the binding packages and convert the format of the yaml files. There is also one related ugly thing in the pyproject.toml at the moment. I didn't manage to figure out a less ugly way to handle it. My plan is that all of that should go away when people have converted the yaml file format and I have updated the packages for the bindings to instead use entry points to find the schemas in there. |
|
OK thanks |
|
For the json schema I committed the full schema including the external packages to be able to provide a link which people can directly load in the metaconfigurator for testing without having to generate it themselves. But I'm not sure if there should be a "standard" schema available in the repository in the future or not. I kind of like the idea of having a pregenerated schema including the most common external packages for new users to get started without having to generate the schema themselves. Then you only need to generate your own schema if you have facility specific packages. But if there should be such a basic, standard schema maybe it should be in it's own repository. Then we in that repository could also have different files for different parts of the schema if we want. |
|
OK so it would be nice to provide a way to generate this schema with the modules you want to use. |
|
That I think already works. As long as you have everything you want to include registered in the schema registry it will show up in the json schema. The key to it is to get external packages to automatically register in the schema registry and make that compatible with what the catalog needs. Those parts doesn't automatically work yet and might require some discussion to figure out the best way since I'm still working on understanding the details of the catalog to be able to make it compatible. But at least you can always manually register everything you want to include. |
|
I'm starting to look at your implementation. |
Yes, my plan is to make the validation a completely separate step done before the factory so the factory is only responsible for building objects. I'm aware of the error line number and also the function to reformat the validation errors from pydantic so they become easier to understand. I want to move that functionality into the SchemaValidator as part of refactoring the factory. |
OK. The refurbishment of the factory should ne be a big deal. You only need to use the new class field to construct the object and expand the dictionary. What I dislike here is the duplication of the constructor signature (as discussed long time ago) which can be an error source difficult to debug. But no other way to allow construction by code without having to construct the schema first. |
I also disliked the duplication of the constructor signature initially. It was annoying to have to write the same thing twice. I tried some ideas to avoid it but found none that I liked and which was easier to use than copy-paste. But after a while I actually started to like it because it made it explicit what the attributes are for building the domain objects. I found several cases where there were attributes in the ConfigModel which was never used for anything because they were hidden inside the I also liked how it forced me to think about the scope of the class. In a pydantic basemodel it is very easy to add 10 attributes but when you need to add all of them again to |
|
My main problem is that this duplication can be an error source if you forget one parameter or enter them in a wrong order you may face tricky issue to solve. Duplication is generally not a good idea. Today: # Set up the chromaticy monitor (override config settings)
CM = SR.get_chromaticity_monitor("CHROMATICITY_MONITOR")
CM._cfg.n_step = 3 # 3 point for chroma fit
CM._cfg.n_avg_meas = 1 # No averagingshould be: # Set up the chromaticy monitor (override config settings)
CM = SR.get_chromaticity_monitor("CHROMATICITY_MONITOR")
CM.set_n_step(3) # 3 point for chroma fit
CM.set_n_avg_meas(1) # No averaging |
I'm not sure I understand. In the schema there should be no method to change the configuration? The purpose of the schema classes is purely to define which fields are required for validation. You would then use the However, if people like the idea, it is possible to later make an implementation where dynamic changes of the configuration also calls the |
|
My remarks was rather link on the the consequence of your validation model (not on the schema itself) and the fact that fields have to be duplicated at the object level which means that you have in fact 3 duplications:
In an ideal work, to follow pyaml coding style, i would like to be able to write: # Set up the chromaticy monitor (override config settings)
CM = SR.get_chromaticity_monitor("CHROMATICITY_MONITOR")
CM.n_step.set(3) # 3 point for chroma fit
CM.n_avg_meas.set(1) # No averagingand having a mechanism (using a decorator or dynamic code generation) to map automatically schema fields to object getter(s)/setter(s) with an optional callback which allow the object to be informed of field updates. It will require at the schema level to be able to select if a field is R or RW. |
If you want that I think it should be implemented in a way to make the object constructor signature being the source of truth. There is a way in pydantic to do that using |
|
OK. For me it is a bit heavy to have all theses duplications and also loose the repr from pydantic. For instance this will not work: arrays: list[ArraySchema] = Field(default_factory=list, repr=False)
devices: list[ElementSchema] = Field(default_factory=list, repr=False)Anyway, this refurbishment is heavy and we need to stabilize the implementation ASAP. |
|
The only option I have found to avoid the duplication which I slightly believe in is discussed here: https://stackoverflow.com/questions/65888153/creating-a-pydantic-model-dynamically-from-a-python-dataclass That would define dataclasses in the business logic which would be dynamically translated into Pydantic basemodels for the validation. But in the end my conclusion after considering different options is the same as someone else also has in the comments. It's better to just write it twice because over time the internal model and the validation schema might start to differ. I think we already have started to see that. Validation and generating the json schema doesn't work as well when arbitrary types are allowed, but it wouldn't make sense to restrict the domain classes to not take custom classes as input. For the majority of the schema classes I have introduced there is duplication of the attribute names but not of their type for exactly this reason. The validation schema and the internal model works better when they are different because they have different purposes and therefore different requirements. |


This PR adds a schema registry that can be used for validation and generating json schemas for dynamic nested pydantic models.
Motivation:
Better separation of concerns. Validation of the configuration is separated into different classes than the ones responsible for storing and using the configuration, resulting in a separate validation layer.
It is possible to make validation option. The entire validation layer can be skipped if the user wished. For example, if they know that the input data has already been validated once and not changed.
More lose dependency of pydantic. Pydantic is only used at the edge of the core and not everywhere in line with what has been discussed for how to manage the dependency of pint.
Solving two problems with pydantic:
Features:
A schema registry which maps a class path to the schema that should be used for validation of the input data to the class.
A decorator to automatically register schemas in the schema registry.
A schema validator which validates nested dynamic models with the use of the registry.
A custom json schema generator which generates json schemas including all available subclasses with the use of the registry.
Major changes:
All ConfigModels have been separate from the domain classes and turned into schema classes which only has the purpose to define the schema used for validation. This is a major refactoring since the ConfigModels are spread out in the codebase. A temporary legacy handler had to be implemented to handle the packages for the bindings. All domain classes also had to be changed since they now need to explicitly declare their attributes.
A baseclass
ConfigurationSchemahas been added that all schemas must inherit from. This is to ensure that all schemas registered in the registry has the minimum required fields and same behaviour.The yaml file format has been changed to instead of defining the type by the module it is defined by the class. This is to be compatible with external packages where a certain module structure can't be enforced and to allow several similar classes to be grouped in the same module. A temporary function to translate between yaml file formats have been added.
Example usage:
Examples of the usage is available in https://github.com/python-accelerator-middle-layer/pyaml/tree/schema-registry/examples/validation.
The json schema can be visualized and tested with MetaConfigurator to easier understand what it includes. You can import the schema directly from here: https://github.com/python-accelerator-middle-layer/pyaml/blob/schema-registry/pyaml/validation/schema.json