Skip to content

opensciencearchive/server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSA Logo

Open Science Archive

A domain-agnostic archive for AI-ready scientific data

Python SDK Issues License

⚠️ Under active development — OSA is pre-release software. APIs, data formats, and configuration will change without notice. Not yet suitable for production use or external contributions.


What is OSA?

OSA is both an open protocol and its reference implementation for scientific data deposition, validation, publication, discovery, and export — standing up PDB-level data infrastructure for any scientific domain.

Convention-driven submissions Conventions bundle a metadata schema, validators, and file requirements into a single submission target.

Pluggable validation Validators are OCI containers with a filesystem I/O contract. No network by default. Domain experts define quality checks; OSA runs them.

Structured Resource Names Globally unique, node-scoped identifiers with clear versioning. urn:osa:{domain}:{type}:{id}[@{version}]

Federation-ready Nodes identified by DNS domain. Records flow between nodes via import, fork, and mirror — preserving provenance.

Quickstart

You don't need to clone this repo to run an OSA archive. The Python SDK (osa-py) ships the whole stack — Postgres, server, and a docker-socket-proxy, brought up with one command.

pip install osa-py
osa init my-archive
cd my-archive
osa start

osa start spins up the stack via Docker Compose and mints a SUPERADMIN dev token so the CLI is authenticated immediately.

Define a convention in Python (schema + validation hooks + ingester), then:

osa deploy             # build hook images, register the convention
osa ingestion start    # pull records via the ingester

The full SDK reference — schemas, hooks, ingesters, the osa test end-to-end harness, and the convention manifest — lives in the osa-py README.

Hack on OSA

Working on the server itself:

git clone https://github.com/opensciencearchive/server.git
cd server
just dev    # Postgres + server + web with hot-reload

Run tests: cd server && just test. Lint + type check: just lint.

osa/
├── server/                  # Python backend (FastAPI)
│   ├── osa/
│   │   ├── domain/          # DDD bounded contexts
│   │   ├── application/     # API routes, DI wiring
│   │   └── infrastructure/  # Adapters (DB, K8s, S3)
│   ├── tests/               # Unit + integration tests
│   ├── migrations/          # Alembic migrations
│   └── sources/             # Data source plugins
├── web/                     # Next.js frontend
│   └── src/                 # React components, pages
└── deploy/                  # Docker Compose orchestration

Canonical Write Path

Deposition  ─→  Validation  ─→  Curation  ─→  Record  ─→  Search & Export
   draft          OCI hooks      approve/       immutable     indexed,
   metadata       structured     reject         versioned     exportable
   + files        checks                        published

Status

OSA is in early development. The local-dev story is in good shape — osa start brings up a fully-authenticated stack with no config. The core write path (deposition through record publication) and the query layer (filtered search over records and feature tables) are both functional. Export, federation, and the web UI are still in progress.

Demos

Protein Pocket Database Semantic GEO Database

License

Apache 2.0

About

A domain-agnostic archive for scientific data

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors