A domain-agnostic archive for AI-ready scientific data
⚠️ Under active development — OSA is pre-release software. APIs, data formats, and configuration will change without notice. Not yet suitable for production use or external contributions.
OSA is both an open protocol and its reference implementation for scientific data deposition, validation, publication, discovery, and export — standing up PDB-level data infrastructure for any scientific domain.
|
Convention-driven submissions Conventions bundle a metadata schema, validators, and file requirements into a single submission target. Pluggable validation Validators are OCI containers with a filesystem I/O contract. No network by default. Domain experts define quality checks; OSA runs them. |
Structured Resource Names
Globally unique, node-scoped identifiers with clear versioning.
Federation-ready Nodes identified by DNS domain. Records flow between nodes via import, fork, and mirror — preserving provenance. |
You don't need to clone this repo to run an OSA archive. The Python SDK (osa-py) ships the whole stack — Postgres, server, and a docker-socket-proxy, brought up with one command.
pip install osa-py
osa init my-archive
cd my-archive
osa startosa start spins up the stack via Docker Compose and mints a SUPERADMIN dev token so the CLI is authenticated immediately.
Define a convention in Python (schema + validation hooks + ingester), then:
osa deploy # build hook images, register the convention
osa ingestion start # pull records via the ingesterThe full SDK reference — schemas, hooks, ingesters, the osa test end-to-end harness, and the convention manifest — lives in the osa-py README.
Working on the server itself:
git clone https://github.com/opensciencearchive/server.git
cd server
just dev # Postgres + server + web with hot-reloadRun tests: cd server && just test. Lint + type check: just lint.
osa/
├── server/ # Python backend (FastAPI)
│ ├── osa/
│ │ ├── domain/ # DDD bounded contexts
│ │ ├── application/ # API routes, DI wiring
│ │ └── infrastructure/ # Adapters (DB, K8s, S3)
│ ├── tests/ # Unit + integration tests
│ ├── migrations/ # Alembic migrations
│ └── sources/ # Data source plugins
├── web/ # Next.js frontend
│ └── src/ # React components, pages
└── deploy/ # Docker Compose orchestration
Deposition ─→ Validation ─→ Curation ─→ Record ─→ Search & Export
draft OCI hooks approve/ immutable indexed,
metadata structured reject versioned exportable
+ files checks published
OSA is in early development. The local-dev story is in good shape — osa start brings up a fully-authenticated stack with no config. The core write path (deposition through record publication) and the query layer (filtered search over records and feature tables) are both functional. Export, federation, and the web UI are still in progress.
Protein Pocket Database Semantic GEO Database
Apache 2.0