Skip to content

feat: implement deposition upload workflow with semantics domain #30

@rorybyrne

Description

@rorybyrne

Implement the deposition upload workflow end-to-end — from ontology and schema definitions through to published records — with spreadsheet-based metadata entry (template generation + upload/validation).

Frontend pages are tracked separately in #57.

Architecture Overview

Semantics Domain          Deposition Domain                  Downstream
─────────────────         ──────────────────                 ──────────
Ontology ◄──────────┐     Convention ◄──── Deposition ──► Validation ──► Curation ──► Record
Schema ──────────────┘          │               │                            │
  (typed fields ref ontologies) │          (metadata validated       (validator metrics
                                │           against schema)          attached to Record)
                           bundles:
                           - Schema ref
                           - File requirements
                           - Validator refs

Domain boundaries

Domain Aggregates Responsibility
Semantics (new) Ontology, Schema Defining meaning — terms, hierarchies, typed metadata structure
Deposition (enhanced) Convention, Deposition Submission lifecycle — templates, drafts, uploads, submission
Validation (existing) ValidationRun Execute validators against submissions; emit structured metrics/metadata
Curation (existing) Human/auto approval gate
Record (existing) Record Immutable published records, with validator-emitted metrics attached

Semantics Domain

New domain: osa/domain/semantics/

Ontology

Replaces the spec's "Vocabulary" concept. A versioned collection of Terms with optional hierarchical relationships. Can be imported from external sources (UBERON, Mondo, NCBI Taxonomy) or created within OSA.

Note: The existing VocabSRN type will be removed and replaced with OntologySRN. We are pre-launch so no deprecation is needed.

SRN: urn:osa:{node}:onto:{id}@{version}

Each Term has:

  • term_id: canonical identifier (e.g. UBERON:0001950)
  • label: human-readable name (e.g. "neocortex")
  • synonyms: alternative names
  • parent_ids: optional is_a / part_of relationships (empty for flat ontologies)
  • definition: optional description
  • deprecated: whether the term is deprecated

A flat ontology (like osa:sex with 4 terms) and a deep hierarchy (like UBERON with 16k terms) use the same abstraction. Flat is just "no parent relationships."

Baseline ontologies (for initial testing — real ontology imports like UBERON/NCBI Taxonomy are a future issue):

  • osa:sex — male, female, mixed, unknown
  • osa:boolean — true, false
  • osa:data-format — common scientific file formats
  • osa:license — open data licenses

Schema

Defines the metadata structure for a class of depositions. A list of typed fields where categorical data is ontology-backed.

SRN: urn:osa:{node}:schema:{id}@{version}

Field types:

Type Purpose Constraints
text Free text (title, description) Optional min/max length, regex
number Numeric (sample count, age) Optional unit, range, integer_only
date Temporal (collection date) ISO format
boolean Flag
term Ontology-backed categorical value References an Ontology SRN; optional root_term constraint
url Links (DOI, ORCID) Valid URL, optional pattern

Each field specifies: name, type, type-specific config, required (bool), cardinality (exactly_one, one_or_more, zero_or_more), description.

Example schema:

{
  "srn": "urn:osa:localhost:schema:scrnaseq-metadata@1",
  "title": "scRNA-seq Metadata",
  "fields": [
    { "name": "title", "type": "text", "required": true, "cardinality": "exactly_one", "description": "Dataset title" },
    { "name": "tissue", "type": "term", "ontology": "urn:osa:localhost:onto:uberon@1", "root_term": "UBERON:0000479", "required": true, "cardinality": "one_or_more" },
    { "name": "organism", "type": "term", "ontology": "urn:osa:localhost:onto:ncbi-taxonomy@1", "required": true, "cardinality": "exactly_one" },
    { "name": "sex", "type": "term", "ontology": "urn:osa:localhost:onto:sex@1", "required": true, "cardinality": "exactly_one" },
    { "name": "sample_count", "type": "number", "integer_only": true, "range": [1, null], "required": true, "cardinality": "exactly_one" },
    { "name": "collection_date", "type": "date", "required": false, "cardinality": "exactly_one" },
    { "name": "doi", "type": "url", "pattern": "^https://doi.org/", "required": false, "cardinality": "exactly_one" }
  ]
}

Deposition Domain (enhanced)

Convention

Lives in the Deposition domain. The user-facing "submission template" — what a depositor selects to know what's expected.

SRN: urn:osa:{node}:conv:{id}@{version}

A Convention bundles:

  • Schema reference (SRN) — what metadata to provide
  • Validator references — which OCI validators to run on submission
  • File requirements — accepted file types, min/max count, size limits
  • Description — human-readable explanation of what this convention is for

Deposition lifecycle

DRAFT ──► SUBMITTED ──► IN_REVIEW ──► ACCEPTED ──► RECORD
  │                                       │
editable                              published
add files                             immutable
  • Draft: Metadata and files are editable. Created against a Convention.
  • Submitted: Immutable. Validation pipeline triggered automatically.
  • In Review: Validation complete. Awaiting curation decision.
  • Accepted: Curator approved. Record created with validator metrics attached.

Note: Uses the existing DepositionStatus values (DRAFT, SUBMITTED, IN_REVIEW, ACCEPTED, REJECTED).

Validators and Record Metrics

Validators are OCI containers referenced by a Convention. When a deposition is submitted, the validators defined in its Convention are executed. Validators emit structured results (metrics, metadata, checks) which are attached to the published Record.

This means:

  • No separate "Trait" abstraction is needed — validator results are the searchable surface
  • Records carry rich, queryable validator output (e.g. "cell_viability: 94.2%") rather than binary stamps
  • Search can filter over validator-emitted metrics directly (e.g. "show me records where cell_viability > 90%")
  • A boolean "all checks passed" can be derived at query time without baking it into the model

Design note: The Trait concept from the spec is deferred. If a formal Trait registry is needed later, it can be layered on top of the validator→metrics model without breaking changes.

Record Versioning — Out of Scope

Record versioning (updating existing records via new depositions, rec:abc123@2) is deferred to a future issue. All published records will be @1 for now.

Spreadsheet-Based Metadata

Template generation

Server generates .xlsx from a Convention's Schema:

  • Column headers = field names
  • Description row = field descriptions, type info, requirements
  • For small ontology-backed fields (sex, license, boolean): Excel data validation dropdowns
  • For large ontology-backed fields (tissue, disease): column header includes ontology name + instructions to enter term IDs
  • Required fields visually distinguished (bold headers, coloured columns)

Upload and validation

  • Parse .xlsx server-side, validate each cell against the Schema
  • Type checking (text, number, date, boolean, url)
  • Term validation (is this a valid term in the referenced ontology?)
  • Required field checking
  • Cardinality checking
  • Return structured validation results for display

File Storage

FileStore (port)
├── LocalFileStore        # Development
└── S3FileStore           # Production (S3-compatible, future)
  • Draft files: temporary, deletable
  • Published files: permanent, immutable
  • Checksum verification on upload

Out of Scope (future issues)

  • Frontend pages (convention catalog, upload page, dashboard) — feat: implement deposition frontend — convention catalog, upload, and dashboard #57
  • Real ontology imports (UBERON, NCBI Taxonomy via OWL/OBO pipeline)
  • Ontology browser UI (rich typeahead, tree explorer)
  • Ontology term search / autocomplete
  • Guided wizard / form-based metadata entry
  • Hierarchical ontology queries (closure table traversal)
  • Embedding-based fuzzy term resolution
  • S3FileStore adapter
  • Resumable uploads
  • Collaboration on drafts
  • Batch deposition (multiple rows = multiple depositions)
  • Record versioning (updating existing records)
  • Trait registry abstraction

Dependencies

Acceptance Criteria

  • Ontologies can be created with terms (flat, via API) using a basic test ontology
  • Schemas can be created with typed fields referencing ontologies
  • Conventions can be created bundling a schema + validator refs + file requirements
  • Template .xlsx can be downloaded for a convention
  • Filled spreadsheet can be uploaded and validated against schema
  • Term fields are validated against referenced ontologies
  • Deposition can be created, metadata provided, files uploaded, and submitted
  • Submitted deposition triggers validators defined in its convention
  • Validator results are attached to published Record as queryable metrics
  • Full pipeline works: submit → validate → curate → record
  • Only owner can edit/submit their depositions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions