Fix issue 150: Create data model by Silvanoc · Pull Request #158 · margo/specification

Silvanoc · 2026-03-13T16:04:28Z

Description

Provide a comprehensive data model using LinkML and generate the documentation and other validation tooling.

⚠️ IMPORTANT REMARK: the hereby generated specification webpage is not yet 100% equivalent to the current release. No more effort will be invested on polishing it until SUP specification-enhancements#48 has been approved. No more effort is needed if it gets rejected.

Issues Addressed

#150

Change Type

Please select the relevant options:

Fix (change that resolves an issue)
New enhancement (change that adds specification content)
Content edits (change that edits existing content)

Checklist

I have read the CONTRIBUTING document.
My changes adhere to the established patterns, and best practices.

ajcraig · 2026-03-13T16:11:10Z

How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element.

Silvanoc · 2026-03-13T16:12:44Z

Looking at commit d5e85be, the data model seems to be a group of isolated classes. But looking at the details lots of commonalities can be identified. We have in fact many different places defining the same type of data! Consolidation is needed and will follow in posterior commits in this PR.

Silvanoc · 2026-03-13T16:15:39Z

How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element.

As of now, I'm using LinkML to generate PlantUML code, which I manually send to a PlantUML server to generate the PNGs. But that's just WIP for the time being.

Before the PR is marked as ready for merging, I need add code to automatically generate the diagrams in SVG format, validate the examples and provide the JSON-Schemas for validation.

Silvanoc · 2026-03-13T16:17:49Z

@ajcraig @nilanjan-samajdar @singhmj-1 this is still a draft, not ready for review! Therefore I've removed all reviewers. Sorry, I've created it initially as "Ready to merge" and you probably got therefore a notification.

Silvanoc · 2026-03-13T16:20:07Z

Once it's ready for review, I'll ask any contributor to the different parts covered by the data model to review it and some specification maintainers.

nilanjan-samajdar · 2026-03-19T16:08:09Z

@Silvanoc,
In order to have LinkML --> OpenAPI, can we take the LinkML “Data Model” and convert it into the OpenAPI/Swagger definition’s “components” section ?

Hence the data-objects get imported from LinkML, but the API paths/request/response needs to be written in swagger.
This approach requires from scripting, but is doable.
Also, if you create an "Endpoint" object in the data-model and also specify the request and HTTP response list at the top, we can even do a way the process of manual OpenAPI API paths/request/response creation and do it all through script

Silvanoc · 2026-03-19T16:23:32Z

@Silvanoc, In order to have LinkML --> OpenAPI, can we take the LinkML “Data Model” and convert it into the OpenAPI/Swagger definition’s “components” section ?

* Hence the data-objects get imported from LinkML, but the API paths/request/response needs to be written in swagger.
  This approach requires from scripting, but is doable.

* Also, if you create an "Endpoint" object in the data-model and also specify the request and HTTP response list at the top, we can even do a way the process of manual OpenAPI API paths/request/response creation and do it all through script

I'm working on it. The generation of the ./components/schemas section out of LinkML is already working in a prototype, but I'm considering an alternative, since I want to contribute it to LinkML.

All other parts of the OpenAPI specification would be provided externally and they are simply appended programatically. But my intention is to have a LinkML generator that takes two arguments (at least):

The OpenAPI head (metadata, ./paths, ...) as a YAML file.
The LinkML data model as a YAML file too.

The generator makes sure that any resource referenced in the ./paths exists in the data model.

nilanjan-samajdar · 2026-03-19T16:36:40Z

The generator makes sure that any resource referenced in the ./paths exists in the data model.

Yes, for other elements of the OpenAPI/Swagger, maybe we can keep a template yaml that the LinkML generator uses.

Silvanoc · 2026-03-20T14:51:23Z

Data model currently looks so:

The only thing that hasn't been generated with LinkML are the dashed lines. Because the references use "hidden" IDs (see #161) that cannot be natively modeled with LinkML.

phil-abb · 2026-03-27T11:41:52Z

@Silvanoc I misread one of your comments yesterday and took this out of draft. After realizing my mistake, I put it back as a draft.

phil-abb · 2026-03-27T11:54:08Z

@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing and the risk of someone accidentally missing something. This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page.

I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead.

stormc · 2026-03-27T13:13:14Z

@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing [...]

As can be seen nicely in the data model graphs above, we already do have that complexity and it's likely to even more increase rather than decrease., i.e., there is this complexity (already now) and it's not going to go away.

This is not introducing complexity but a means to tame the existing (and growing) complexity into a coherent and consistent single source of truth – which is really needed as the PlugFest has shown where we uncovered (very) small inconsistencies here and there that in sum break the whole thing.

We cannot hide complexity, it's there, and trying hiding even parts of it makes it overall an inconsistent mess. The only question IMO is what is the right tooling to help us managing that complexity?

[...] and the risk of someone accidentally missing something.

This is actually prevented by having rigor here.

This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page. [...]
I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead.

Granted, this needs to be made as convenient as possible with automation and tooling.

phil-abb · 2026-03-27T13:27:53Z

As can be seen nicely in the data model graphs above, we already do have that complexity

@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex.

I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think.

stormc · 2026-03-27T14:09:26Z

As can be seen nicely in the data model graphs above, we already do have that complexity

@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex.

If you have to fully understand all the gory details of this, then the automation/tooling is insufficient. You will have to follow some (probably extra) steps, granted, but that shouldn't force you to understand the whole machinery. It will be a process getting to this stage, but I do not see an alternative to be honest.

I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think.

Fully agree. We need tooling, good tooling, that doesn't stand in between you and contributing, quite the opposite.

Silvanoc · 2026-04-02T09:13:28Z

⚠️ I'm generating GitHub Pages in my namespace so that you can see the result. It is still only a draft, therefore some details don't fit yet. But it's enough to get a feeling an impression on the result.