Fix issue 150: Create data model#158
Conversation
|
How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element. |
|
Looking at commit d5e85be, the data model seems to be a group of isolated classes. But looking at the details lots of commonalities can be identified. We have in fact many different places defining the same type of data! Consolidation is needed and will follow in posterior commits in this PR.
|
As of now, I'm using LinkML to generate PlantUML code, which I manually send to a PlantUML server to generate the PNGs. But that's just WIP for the time being. Before the PR is marked as ready for merging, I need add code to automatically generate the diagrams in SVG format, validate the examples and provide the JSON-Schemas for validation. |
|
@ajcraig @nilanjan-samajdar @singhmj-1 this is still a draft, not ready for review! Therefore I've removed all reviewers. Sorry, I've created it initially as "Ready to merge" and you probably got therefore a notification. |
|
Once it's ready for review, I'll ask any contributor to the different parts covered by the data model to review it and some specification maintainers. |
69989a7 to
4d2c376
Compare
|
@Silvanoc,
|
I'm working on it. The generation of the All other parts of the OpenAPI specification would be provided externally and they are simply appended programatically. But my intention is to have a LinkML generator that takes two arguments (at least):
The generator makes sure that any resource referenced in the |
Yes, for other elements of the OpenAPI/Swagger, maybe we can keep a template yaml that the LinkML generator uses. |
|
Data model currently looks so: The only thing that hasn't been generated with LinkML are the dashed lines. Because the references use "hidden" IDs (see #161) that cannot be natively modeled with LinkML. |
5367f5f to
856b95a
Compare
|
@Silvanoc I misread one of your comments yesterday and took this out of draft. After realizing my mistake, I put it back as a draft. |
|
@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing and the risk of someone accidentally missing something. This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page. I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead. |
As can be seen nicely in the data model graphs above, we already do have that complexity and it's likely to even more increase rather than decrease., i.e., there is this complexity (already now) and it's not going to go away. This is not introducing complexity but a means to tame the existing (and growing) complexity into a coherent and consistent single source of truth – which is really needed as the PlugFest has shown where we uncovered (very) small inconsistencies here and there that in sum break the whole thing. We cannot hide complexity, it's there, and trying hiding even parts of it makes it overall an inconsistent mess. The only question IMO is what is the right tooling to help us managing that complexity?
This is actually prevented by having rigor here.
Granted, this needs to be made as convenient as possible with automation and tooling. |
@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex. I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think. |
If you have to fully understand all the gory details of this, then the automation/tooling is insufficient. You will have to follow some (probably extra) steps, granted, but that shouldn't force you to understand the whole machinery. It will be a process getting to this stage, but I do not see an alternative to be honest.
Fully agree. We need tooling, good tooling, that doesn't stand in between you and contributing, quite the opposite. |
|
|
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Avoid example validation silently failing to find the examples and therefore reporting success just because there is nothing to check. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Style and typo fixing Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Document why `default_range` is not set in some of the schemas. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
`required: false` is the default, no need to have it everywhere. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Replace custom OpenAPI generator with LinkML upstream one. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
As of now LinkML does not support 3.14 yet. This patch ensures that a supported version is available. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
This commit should be reverted after review and possibly before merging. Otherwise right after merging. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Listing the slots of a superclass in a subclass that modifying their usage (slot_usage) should not be needed. This is probably a bug in LinkML. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
8a80e62 to
48949d9
Compare
|
@Silvanoc I stumbled upon the following issues with using LinkML for this solution:
To resolve this, I see the following potential paths forward:
Solution to Problem 3: To ensure that we follow one single source of truth then we might use something like P.S. : All of these solutions seem to increase the complexity, especially when documentation is partially human authored and rest auto-generated. Interleaving them only raises the complexity. If we want a single source of truth for docs, then IMO we should completely avoid manually written docs. |
@singhmj-1 let me try to address your different points individually:
Yes, I'm trying to repeat it as much as possible to avoid wrong expectations :-) LinkML only models the data! So in the case of an OpenAPI specification, it can only model the resources, not the endpoints. But that should be fine, since the goal of the data model is ensuring consistency among all the different resources that we define in Margo. Everything that is not a resource schema is provided in a sort-of OpenAPI template, you can see the current one here (path is WIP, probably not the best one). I contributed to LinkML exactly this functionality, available since LinkML v1.11. I could also try to extend the OpenAPI generator (either contributing upstream or having a derived custom tool) so that endpoints, response codes,... can be also modeled with LinkML. But I'm not convinced that it's needed as of now, or even needed.
The intention is to have tools to generate those resources, like this one for the OpenAPI specification. Exactly to avoid stale documentation, the data model is the source of truth WRT data and everything derived from it MUST be generated.
LinkML can be easily used as a package in a Python program that generates the workflow diagrams. How are they currently being generated? If they're being generated out of the OpenAPI specification, then it's even easier: generate first the OpenAPI specification with LinkML and then the workflows with the current tool.
That's the current intention.
I would discard this option.
I'm not sure I fully get this option, it sounds to me like Option A.
There is an approved SUP to use a data model (what IMO is completely out of discussion) and to use LinkML for that purpose. Do you have an alternative proposal? Provide a SUP so that we can discuss it. And try to provide a rough running example like mine, then you'll realize that something else might be even more complex than LinkML. But you can try.
I don't get the "source of WHAT" is meant in "one single source of truth" above. In my proposal, it's clear: the whole data model. In your proposal it sounds to me like a focus only on the APIs, but we have other resources and your proposal does not help ensuring consistency across all of them.
It's clear that it adds complexity on tooling, but reduces complexity on keeping consistency and reducing errors. We have already a complex data model. It is difficult to glimpse it when only looking at the web pages, but that complexity feels when you start working on the details and multiple inconsistencies/incompatibilities are detected. If you look at this diagram, you can see how many inter-connected or even shared resources we have spread over the whole documentation. And we have also the examples... We have already experienced how tough it is to keep things consistent. You cannot convince me that maintaining and extending it manually scales, but you can try to convince others to revert my SUP 😉 No matter how much you generate, humans will be writing the real content in whatever format (MarkDown format in a Jinja2 template, class description in a LinkML document,...) is required. It's only the integration of the different parts and some formatting aspects (like the creation of a MarkDown table out of a list of attributes) what happens without any human input. |
|
Thanks for your explanation. And I agree with the following:
And I actually scrolled through modeling frameworks again, but everything appeared complex. And in the end I thought, "why even provide a documentation/spec?!". 😝 😆 Btw, all I mentioned was my experience while working on your solution and I agree that once everything is in place it will make life easier. And I've done some changes on top of yours over here. For example, considering the OpenAPI generation from the template, we can use something like this: This will never break the OpenAPI schema, and will still clarify the intent to the reader that these schemas are sourced from linkml. And the users can name the schemas whatever they like, while the underlying definitions are dynamically rendered from the LinkML source by a custom tool. If interested, my branch has all the code. |
It has an advantage (🟢) and a disadvantage (🔴): 🟢 it supports decoupling the resource name in the data model from that used in the OpenAPI 🔴 it is not supported by the current (contributed by me) version of the LinkML OpenAPI generator Is the real motivation for this proposal having a "valid" OpenAPI schema? What for? It might be syntactically correct, but it remains semantically incorrect until the "x-linkml-" elements are replaced. The only added value (over the above mentioned name decoupling) I see is that OpenAPI validation tools (e.g. IDE) won't report an issue, right? |
|
@Silvanoc, as the openapi routes refer the schemas that are not yet generated, I think people will make mistakes in referring them (even after reading the generator tool's documentation). And the IDE validators will not be of much help as well. |
@singhmj-1 good points. I'll implement your proposal and try to get it upstream. I'll add something to your OpenAPI template proposal: a |

Description
Provide a comprehensive data model using LinkML and generate the documentation and other validation tooling.
Issues Addressed
#150
Change Type
Please select the relevant options:
Checklist