Add BDF Toolbox metadata (bdf.yaml + codemeta.json + CITATION.cff)#1403
Add BDF Toolbox metadata (bdf.yaml + codemeta.json + CITATION.cff)#1403allaway wants to merge 4 commits into
Conversation
Adds machine-readable project metadata to satisfy ARPA-H BDF ENHANCE Scorecard checks: - bdf.yaml: SystemMetadata instance for the ARPA-H-BDF/bdfkb-schema (validated with linkml-validate). Resolves BDF metadata + TRL maturity. - codemeta.json: CodeMeta/schema.org SoftwareSourceCode descriptor. - CITATION.cff: Citation File Format 1.2.0 metadata. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Tagging @milen-sage for viz |
|
@milen-sage, can you please comment on these aspects of the PR:
|
| title: "Synapse Python Client (synapseclient)" | ||
| type: software | ||
| authors: | ||
| # TODO: add individual authors (names / ORCIDs) in addition to the organization below. |
There was a problem hiding this comment.
Which individual authors should be going into this section? Really it would be all of @Sage-Bionetworks/dpe
| "@type": "SoftwareSourceCode", | ||
| "name": "Synapse Python Client (synapseclient)", | ||
| "description": "A Python client for Sage Bionetworks' Synapse, a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate. The client can be used as a library for software that communicates with Synapse or as a command-line utility.", | ||
| "version": "4.13.0", |
There was a problem hiding this comment.
This is going to be another annoying spot to maintain a version that can easily diverge from the actual deployed version. Is there a way to dynamically bring this version in from https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/synapsePythonClient so we only maintain it in a single place?
| # (src/bdfkb_schema/schema/bdfkb_schema.yaml, tree_root: SystemMetadata). | ||
| # Validate with: | ||
| # linkml-validate -s <path>/bdfkb_schema.yaml -C SystemMetadata bdf.yaml | ||
| version: "4.13.0" |
There was a problem hiding this comment.
Another version value to maintain - Ideally it can come from https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/synapsePythonClient
| credit: | ||
| - name: "Sage Bionetworks" | ||
| email: | ||
| - "platform@sagebase.org" # TODO: confirm preferred BDF contact email |
| funding: | ||
| source: "TODO: confirm funding source" | ||
| agreement: "TODO: confirm agreement / award number" | ||
| link: "https://example.org/TODO-confirm-funding-link" |
There was a problem hiding this comment.
Do these need to be fixed?
Addresses review feedback that the version in bdf.yaml and codemeta.json (and CITATION.cff) duplicates the canonical client version and can drift. Adds: - .github/scripts/sync_version_metadata.py: rewrites the version field in bdf.yaml, codemeta.json, and CITATION.cff to match latestVersion in synapseclient/synapsePythonClient (the single source of truth). Supports --write and --check. - .github/workflows/sync-version-metadata.yml: on PRs that touch the version file or metadata, runs the sync and commits the fix back to the PR branch (same-repo), or fails with guidance for fork PRs so drift can't merge. Now the version is maintained in one place; releases that bump synapseclient/synapsePythonClient propagate automatically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| on: | ||
| pull_request: | ||
| paths: | ||
| - "synapseclient/synapsePythonClient" | ||
| - "bdf.yaml" | ||
| - "codemeta.json" | ||
| - "CITATION.cff" | ||
| - ".github/scripts/sync_version_metadata.py" | ||
| - ".github/workflows/sync-version-metadata.yml" |
There was a problem hiding this comment.
This is good in theory, but it doesn't match the typical workflow that we follow for updating the version and going through the release process.
Some AI suggestions I was seeing:
- Add in a local pre-commit hook (Most promising, but is outside the typical CI/CD)
- Update the release process to modify the version in 3 places (Not the worst, but prone to error)
- Update our release process to remove direct commit/pushing and follow a PR style approach (I'd rather not take this so that we can follow the same process)
- Update the build process in this area to also call this sync_version_metadata.py script. This would be good automation, but it would not affect source control - Only the artifact release to pypi.
synapsePythonClient/.github/workflows/build.yml
Lines 396 to 403 in 7c60563
@allaway I see https://github.com/ARPA-H-BDF/bdfkb-schema/blob/6afe39430894512131b7f86f75ac165e8db65551/src/bdfkb_schema/schema/bdfkb_schema.yaml#L101-L104 that version is a required field. However, what I am not certain on is if the version in source control has to be correct, or just the version in the packaged artifact of https://pypi.org/project/synapseclient/ .
If source control has to be correct then I think the most straight forward way is to just update our release process to do this manually in 3 spots.
If source control doesn't have to be correct then packaging the artifact with the correct version/files would be nice for maintainability.
Summary
Adds machine-readable project metadata to satisfy several ARPA-H BDF ENHANCE Scorecard checks. All three files are new and live at the repo root; no code, dependencies, or workflows are touched.
Files added
bdf.yamlSystemMetadatainstance conforming toARPA-H-BDF/bdfkb-schema(src/bdfkb_schema/schema/bdfkb_schema.yaml, root classSystemMetadata).current_maturity: 8, ≥5)codemeta.jsonSoftwareSourceCodedescriptor.CITATION.cffValidation performed
bdf.yaml:linkml-validate -s bdfkb_schema.yaml -C SystemMetadata bdf.yaml→ No issues found.codemeta.json: parses as valid JSON.CITATION.cff: parses as valid YAML.pre-commithooks (check-json,check-yaml, trailing-whitespace, end-of-file-fixer, etc.) pass on the new files.Real values for the client were filled in where known (version
4.13.0, repo URL,https://www.synapse.org,https://python-docs.synapse.org, PyPI URL, REST endpointrepo-prod.prod.sagebase.org/repo/v1, Apache-2.0 license confirmed againstLICENSE).These are placeholders (some are dummy-but-valid values needed to pass schema validation) and must be reviewed before this is considered authoritative:
bdf.yaml→maturity.current_maturity: 8— confirm official BDF TRL with Sage program lead (must be ≥5 to pass the scorecard).bdf.yaml→funding.source,funding.agreement(award number),funding.link— currently TODO placeholders (thelinkis a dummy valid URL so the file validates).bdf.yaml→credit[0].email(platform@sagebase.org) — confirm preferred BDF contact email.CITATION.cff→ individual authors (currently only the "Sage Bionetworks" org entity) — see inline# TODO.Out of scope (intentionally not done)
.github/workflowssecrets handling — that scorecard finding is a false positive; left untouched.🤖 Generated with Claude Code