Skip to content

Fix issue 150: Create data model#158

Draft
Silvanoc wants to merge 52 commits into
margo:pre-draftfrom
Silvanoc:create-data-model
Draft

Fix issue 150: Create data model#158
Silvanoc wants to merge 52 commits into
margo:pre-draftfrom
Silvanoc:create-data-model

Conversation

@Silvanoc

@Silvanoc Silvanoc commented Mar 13, 2026

Copy link
Copy Markdown
Contributor

Description

Provide a comprehensive data model using LinkML and generate the documentation and other validation tooling.

⚠️ IMPORTANT REMARK: the hereby generated specification webpage is not yet 100% equivalent to the current release. No more effort will be invested on polishing it until SUP specification-enhancements#48 has been approved. No more effort is needed if it gets rejected.

Issues Addressed

#150

Change Type

Please select the relevant options:

  • Fix (change that resolves an issue)
  • New enhancement (change that adds specification content)
  • Content edits (change that edits existing content)

Checklist

  • I have read the CONTRIBUTING document.
  • My changes adhere to the established patterns, and best practices.

@Silvanoc Silvanoc requested a review from a team as a code owner March 13, 2026 16:04
@Silvanoc Silvanoc marked this pull request as draft March 13, 2026 16:04
@ajcraig

ajcraig commented Mar 13, 2026

Copy link
Copy Markdown
Contributor

How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element.

@Silvanoc

Silvanoc commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

Looking at commit d5e85be, the data model seems to be a group of isolated classes. But looking at the details lots of commonalities can be identified. We have in fact many different places defining the same type of data! Consolidation is needed and will follow in posterior commits in this PR.

image

@Silvanoc

Copy link
Copy Markdown
Contributor Author

How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element.

As of now, I'm using LinkML to generate PlantUML code, which I manually send to a PlantUML server to generate the PNGs. But that's just WIP for the time being.

Before the PR is marked as ready for merging, I need add code to automatically generate the diagrams in SVG format, validate the examples and provide the JSON-Schemas for validation.

@Silvanoc Silvanoc removed request for a team, nilanjan-samajdar and singhmj-1 March 13, 2026 16:16
@Silvanoc

Copy link
Copy Markdown
Contributor Author

@ajcraig @nilanjan-samajdar @singhmj-1 this is still a draft, not ready for review! Therefore I've removed all reviewers. Sorry, I've created it initially as "Ready to merge" and you probably got therefore a notification.

@Silvanoc

Copy link
Copy Markdown
Contributor Author

Once it's ready for review, I'll ask any contributor to the different parts covered by the data model to review it and some specification maintainers.

@Silvanoc Silvanoc force-pushed the create-data-model branch 2 times, most recently from 69989a7 to 4d2c376 Compare March 18, 2026 11:43
@nilanjan-samajdar

Copy link
Copy Markdown

@Silvanoc,
In order to have LinkML --> OpenAPI, can we take the LinkML “Data Model” and convert it into the OpenAPI/Swagger definition’s “components” section ?

  • Hence the data-objects get imported from LinkML, but the API paths/request/response needs to be written in swagger.
    This approach requires from scripting, but is doable.
  • Also, if you create an "Endpoint" object in the data-model and also specify the request and HTTP response list at the top, we can even do a way the process of manual OpenAPI API paths/request/response creation and do it all through script

@Silvanoc

Copy link
Copy Markdown
Contributor Author

@Silvanoc, In order to have LinkML --> OpenAPI, can we take the LinkML “Data Model” and convert it into the OpenAPI/Swagger definition’s “components” section ?

* Hence the data-objects get imported from LinkML, but the API paths/request/response needs to be written in swagger.
  This approach requires from scripting, but is doable.

* Also, if you create an "Endpoint" object in the data-model and also specify the request and HTTP response list at the top, we can even do a way the process of manual OpenAPI API paths/request/response creation and do it all through script

I'm working on it. The generation of the ./components/schemas section out of LinkML is already working in a prototype, but I'm considering an alternative, since I want to contribute it to LinkML.

All other parts of the OpenAPI specification would be provided externally and they are simply appended programatically. But my intention is to have a LinkML generator that takes two arguments (at least):

  1. The OpenAPI head (metadata, ./paths, ...) as a YAML file.
  2. The LinkML data model as a YAML file too.

The generator makes sure that any resource referenced in the ./paths exists in the data model.

@nilanjan-samajdar

Copy link
Copy Markdown

The generator makes sure that any resource referenced in the ./paths exists in the data model.

Yes, for other elements of the OpenAPI/Swagger, maybe we can keep a template yaml that the LinkML generator uses.

@Silvanoc

Copy link
Copy Markdown
Contributor Author

Data model currently looks so:
DataModel-ClassDiagram

The only thing that hasn't been generated with LinkML are the dashed lines. Because the references use "hidden" IDs (see #161) that cannot be natively modeled with LinkML.

@Silvanoc Silvanoc force-pushed the create-data-model branch 3 times, most recently from 5367f5f to 856b95a Compare March 23, 2026 12:29
@phil-abb phil-abb marked this pull request as ready for review March 26, 2026 15:20
@phil-abb phil-abb marked this pull request as draft March 27, 2026 11:37
@phil-abb

Copy link
Copy Markdown
Contributor

@Silvanoc I misread one of your comments yesterday and took this out of draft. After realizing my mistake, I put it back as a draft.

@phil-abb

Copy link
Copy Markdown
Contributor

@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing and the risk of someone accidentally missing something. This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page.

I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead.

@stormc

stormc commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing [...]

As can be seen nicely in the data model graphs above, we already do have that complexity and it's likely to even more increase rather than decrease., i.e., there is this complexity (already now) and it's not going to go away.

This is not introducing complexity but a means to tame the existing (and growing) complexity into a coherent and consistent single source of truth – which is really needed as the PlugFest has shown where we uncovered (very) small inconsistencies here and there that in sum break the whole thing.

We cannot hide complexity, it's there, and trying hiding even parts of it makes it overall an inconsistent mess. The only question IMO is what is the right tooling to help us managing that complexity?

[...] and the risk of someone accidentally missing something.

This is actually prevented by having rigor here.

This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page. [...]
I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead.

Granted, this needs to be made as convenient as possible with automation and tooling.

@phil-abb

Copy link
Copy Markdown
Contributor

As can be seen nicely in the data model graphs above, we already do have that complexity

@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex.

I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think.

@stormc

stormc commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

As can be seen nicely in the data model graphs above, we already do have that complexity

@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex.

If you have to fully understand all the gory details of this, then the automation/tooling is insufficient. You will have to follow some (probably extra) steps, granted, but that shouldn't force you to understand the whole machinery. It will be a process getting to this stage, but I do not see an alternative to be honest.

I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think.

Fully agree. We need tooling, good tooling, that doesn't stand in between you and contributing, quite the opposite.

@Silvanoc

Silvanoc commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

⚠️ I'm generating GitHub Pages in my namespace so that you can see the result. It is still only a draft, therefore some details don't fit yet. But it's enough to get a feeling an impression on the result.

Silvanoc added 22 commits May 20, 2026 19:25
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Avoid example validation silently failing to find the examples and
therefore reporting success just because there is nothing to check.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Style and typo fixing

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Document why `default_range` is not set in some of the schemas.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
`required: false` is the default, no need to have it everywhere.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Replace custom OpenAPI generator with LinkML upstream one.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
As of now LinkML does not support 3.14 yet. This patch ensures that a
supported version is available.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
This commit should be reverted after review and possibly before merging.
Otherwise right after merging.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Listing the slots of a superclass in a subclass that modifying their
usage (slot_usage) should not be needed. This is probably a bug in
LinkML.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
@singhmj-1

singhmj-1 commented May 28, 2026

Copy link
Copy Markdown
Contributor

@Silvanoc I stumbled upon the following issues with using LinkML for this solution:

  1. Requires manual template upkeep: Since LinkML only defines the data structures, we still need to create an OpenAPI template and manually add API URIs, requests, and response codes. Then, we have to inject the LinkML schema into this template just to create the final spec file.
  2. Cannot generate API docs: Our specification repository includes Markdown documents that outline API routes, requests, and responses. Because LinkML is designed strictly for data modeling rather than API workflows, it cannot generate this content and these files in their current format. Not generating this content will create a risk of stale documentation.
  3. Cannot generate API workflows: There are some doc files in our project(like device-client-onboarding.md) which specify the flow of the APIs using mermaid diagrams. These cannot be auto-generated via LinkML, nor can be done easily with OpenAPI spec.

To resolve this, I see the following potential paths forward:

  • Option A: Edit these Markdown templates manually to seed the API routes and other non-LinkML info. We would use Jinja placeholders to inject the LinkML data, meaning we must maintain both the Markdown templates and the OpenAPI template.

  • Option B: Clean up the Markdown files by completely removing the API route mentions, and rely instead on the interactive Swagger UI already embedded in the site for API routing content.

  • Option C: Automate the Markdown files. We can transition to generating these route details into Markdown automatically by parsing the OpenAPI template we created for LinkML. This will keep the existing pipeline.

  • Option D: Drop LinkML entirely. We use an OpenAPI spec to generate the entire model hierarchy and the Markdown docs, though we lose LinkML's advanced semantic web features.

Solution to Problem 3: To ensure that we follow one single source of truth then we might use something like Arazzo that can embed the API workflows.

P.S. : All of these solutions seem to increase the complexity, especially when documentation is partially human authored and rest auto-generated. Interleaving them only raises the complexity. If we want a single source of truth for docs, then IMO we should completely avoid manually written docs.

cc: @nilanjan-samajdar @phil-abb @ajcraig

@Silvanoc

Silvanoc commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@Silvanoc I stumbled upon the following issues with using LinkML for this solution:

@singhmj-1 let me try to address your different points individually:

  1. Requires manual template upkeep: Since LinkML only defines the data structures, we still need to create an OpenAPI template and manually add API URIs, requests, and response codes. Then, we have to inject the LinkML schema into this template just to create the final spec file.

Yes, I'm trying to repeat it as much as possible to avoid wrong expectations :-)

LinkML only models the data! So in the case of an OpenAPI specification, it can only model the resources, not the endpoints. But that should be fine, since the goal of the data model is ensuring consistency among all the different resources that we define in Margo.

Everything that is not a resource schema is provided in a sort-of OpenAPI template, you can see the current one here (path is WIP, probably not the best one). I contributed to LinkML exactly this functionality, available since LinkML v1.11.

I could also try to extend the OpenAPI generator (either contributing upstream or having a derived custom tool) so that endpoints, response codes,... can be also modeled with LinkML. But I'm not convinced that it's needed as of now, or even needed.

  1. Cannot generate API docs: Our specification repository includes Markdown documents that outline API routes, requests, and responses. Because LinkML is designed strictly for data modeling rather than API workflows, it cannot generate this content and these files in their current format. Not generating this content will create a risk of stale documentation.

The intention is to have tools to generate those resources, like this one for the OpenAPI specification. Exactly to avoid stale documentation, the data model is the source of truth WRT data and everything derived from it MUST be generated.

  1. Cannot generate API workflows: There are some doc files in our project(like device-client-onboarding.md) which specify the flow of the APIs using mermaid diagrams. These cannot be auto-generated via LinkML, nor can be done easily with OpenAPI spec.

LinkML can be easily used as a package in a Python program that generates the workflow diagrams. How are they currently being generated? If they're being generated out of the OpenAPI specification, then it's even easier: generate first the OpenAPI specification with LinkML and then the workflows with the current tool.

To resolve this, I see the following potential paths forward:

  • Option A: Edit these Markdown templates manually to seed the API routes and other non-LinkML info. We would use Jinja placeholders to inject the LinkML data, meaning we must maintain both the Markdown templates and the OpenAPI template.

That's the current intention.

  • Option B: Clean up the Markdown files by completely removing the API route mentions, and rely instead on the interactive Swagger UI already embedded in the site for API routing content.

I would discard this option.

  • Option C: Automate the Markdown files. We can transition to generating these route details into Markdown automatically by parsing the OpenAPI template we created for LinkML. This will keep the existing pipeline.

I'm not sure I fully get this option, it sounds to me like Option A.

  • Option D: Drop LinkML entirely. We use an OpenAPI spec to generate the entire model hierarchy and the Markdown docs, though we lose LinkML's advanced semantic web features.

There is an approved SUP to use a data model (what IMO is completely out of discussion) and to use LinkML for that purpose. Do you have an alternative proposal? Provide a SUP so that we can discuss it. And try to provide a rough running example like mine, then you'll realize that something else might be even more complex than LinkML. But you can try.

Solution to Problem 3: To ensure that we follow one single source of truth then we might use something like Arazzo that can embed the API workflows.

I don't get the "source of WHAT" is meant in "one single source of truth" above. In my proposal, it's clear: the whole data model. In your proposal it sounds to me like a focus only on the APIs, but we have other resources and your proposal does not help ensuring consistency across all of them.

P.S. : All of these solutions seem to increase the complexity, especially when documentation is partially human authored and rest auto-generated. Interleaving them only raises the complexity. If we want a single source of truth for docs, then IMO we should completely avoid manually written docs.

It's clear that it adds complexity on tooling, but reduces complexity on keeping consistency and reducing errors. We have already a complex data model. It is difficult to glimpse it when only looking at the web pages, but that complexity feels when you start working on the details and multiple inconsistencies/incompatibilities are detected.

If you look at this diagram, you can see how many inter-connected or even shared resources we have spread over the whole documentation. And we have also the examples... We have already experienced how tough it is to keep things consistent. You cannot convince me that maintaining and extending it manually scales, but you can try to convince others to revert my SUP 😉

No matter how much you generate, humans will be writing the real content in whatever format (MarkDown format in a Jinja2 template, class description in a LinkML document,...) is required. It's only the integration of the different parts and some formatting aspects (like the creation of a MarkDown table out of a list of attributes) what happens without any human input.

cc: @nilanjan-samajdar @phil-abb @ajcraig

@singhmj-1

Copy link
Copy Markdown
Contributor

Thanks for your explanation. And I agree with the following:

something else might be even more complex than LinkML

And I actually scrolled through modeling frameworks again, but everything appeared complex. And in the end I thought, "why even provide a documentation/spec?!". 😝 😆

Btw, all I mentioned was my experience while working on your solution and I agree that once everything is in place it will make life easier. And I've done some changes on top of yours over here.

For example, considering the OpenAPI generation from the template, we can use something like this:

paths:
  /api/v1/clients/{clientId}/deployments/{deploymentId}/status:
    post:
      summary: Report deployment status
      security:
        - PayloadSignature: []
      parameters:
        - name: clientId
          in: path
          required: true
          schema:
            type: string
        - name: deploymentId
          in: path
          required: true
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/DeploymentStatusManifest"

schemas:
    DeploymentStatusManifest:
      x-linkml-source-type: class
      x-linkml-source: DeploymentStatusManifest
      x-linkml-schema: ../../data-models/margo.linkml.yaml

    DeviceOnboardingManifest:
      x-linkml-source-type: class
      x-linkml-source: OnboardingRequest
      x-linkml-schema: ../../data-models/margo.linkml.yaml

This will never break the OpenAPI schema, and will still clarify the intent to the reader that these schemas are sourced from linkml. And the users can name the schemas whatever they like, while the underlying definitions are dynamically rendered from the LinkML source by a custom tool. If interested, my branch has all the code.

@Silvanoc

Silvanoc commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

For example, considering the OpenAPI generation from the template, we can use something like this:

paths:
  /api/v1/clients/{clientId}/deployments/{deploymentId}/status:
    post:
      summary: Report deployment status
      security:
        - PayloadSignature: []
      parameters:
        - name: clientId
          in: path
          required: true
          schema:
            type: string
        - name: deploymentId
          in: path
          required: true
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/DeploymentStatusManifest"

schemas:
    DeploymentStatusManifest:
      x-linkml-source-type: class
      x-linkml-source: DeploymentStatusManifest
      x-linkml-schema: ../../data-models/margo.linkml.yaml

    DeviceOnboardingManifest:
      x-linkml-source-type: class
      x-linkml-source: OnboardingRequest
      x-linkml-schema: ../../data-models/margo.linkml.yaml

This will never break the OpenAPI schema, and will still clarify the intent to the reader that these schemas are sourced from linkml. And the users can name the schemas whatever they like, while the underlying definitions are dynamically rendered from the LinkML source by a custom tool. If interested, my branch has all the code.

It has an advantage (🟢) and a disadvantage (🔴):

🟢 it supports decoupling the resource name in the data model from that used in the OpenAPI

🔴 it is not supported by the current (contributed by me) version of the LinkML OpenAPI generator

Is the real motivation for this proposal having a "valid" OpenAPI schema? What for? It might be syntactically correct, but it remains semantically incorrect until the "x-linkml-" elements are replaced. The only added value (over the above mentioned name decoupling) I see is that OpenAPI validation tools (e.g. IDE) won't report an issue, right?

@singhmj-1

Copy link
Copy Markdown
Contributor

@Silvanoc, as the openapi routes refer the schemas that are not yet generated, I think people will make mistakes in referring them (even after reading the generator tool's documentation). And the IDE validators will not be of much help as well.

@Silvanoc

Copy link
Copy Markdown
Contributor Author

@Silvanoc, as the openapi routes refer the schemas that are not yet generated, I think people will make mistakes in referring them (even after reading the generator tool's documentation). And the IDE validators will not be of much help as well.

@singhmj-1 good points. I'll implement your proposal and try to get it upstream.

I'll add something to your OpenAPI template proposal: a type and a description because some OpenAPI validators will otherwise report and issue on those resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants