Skip to content

OpenElementsLabs/db-backup-service

db-backup-service

Generic PostgreSQL backup sidecar — scheduled pg_dump backups with a small REST API for on-demand triggering, listing, and download.

CI License: Apache 2.0 Container

db-backup-service is a small, single-purpose container that runs as a sidecar to a PostgreSQL database. It periodically takes compressed pg_dump backups and uploads them to a configurable storage backend (S3-compatible object storage or a local filesystem). On top of that scheduled behaviour it exposes a minimal, authenticated REST API that other services (typically a backend application) can use to trigger out-of-schedule backups, list existing backups, and download them — for example to populate a staging environment from production data.

The backup files are standard pg_dump --no-owner --no-privileges plain SQL, gzipped. They are restorable with stock psql even without this service. A bundled db-restore CLI inside the container provides an operator-friendly atomic restore path via gunzip -c <file> | psql --single-transaction.

Features

  • Scheduled backups at a configurable interval (default 24 hours, any ISO-8601 duration or simple 24h/30m shorthand accepted).
  • On-demand backups via an authenticated REST API.
  • Two storage backends — S3-compatible (AWS S3, MinIO, Hetzner Object Storage, …) or a local filesystem path.
  • Atomic restore via the bundled db-restore CLI (drops and recreates the target database, restores in a single transaction so any error rolls the whole import back).
  • Retention policy with a "latest always kept" safety floor — old backups are pruned after RETENTION_DAYS, but the most recent successful backup is never deleted.
  • Streaming pipelinepg_dump stdout is piped through gzip straight into the storage backend, so the container never needs disk space for a full backup file.
  • Structured JSON logs by default (LOG_FORMAT=plain available for human-readable output during development).
  • Health and info endpoints (/health, /info) suitable for Kubernetes probes and external monitoring. /info exposes backup.lastSuccessfulBackupAgeSeconds so a monitoring stack can alert on backup staleness without an in-band alerting story.
  • Multi-arch container image for linux/amd64 and linux/arm64 published to ghcr.io/openelementslabs/db-backup-service on every Git tag.

Quick start

Two complete examples live at the repository root:

Both files are self-contained — set a strong API_TOKEN and you are running.

Local storage (Docker Compose)

export API_TOKEN="$(openssl rand -hex 32)"
docker compose -f docker-compose.local.yml up

# Trigger an on-demand backup
curl -X POST -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups

# List backups (newest first)
curl -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups

# Download the latest backup
curl -OJ -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups/latest/download

S3 storage (Docker Compose, with MinIO)

The same curl snippets from the local example work against this stack unchanged — the storage backend is transparent to the API.

Configuration

All configuration is via environment variables. The table below is the single source of truth for operators.

Variable Required Default Purpose
DB_HOST yes Hostname of the PostgreSQL server.
DB_PORT no 5432 TCP port of the PostgreSQL server.
DB_NAME yes Database name to back up.
DB_USER yes PostgreSQL user. Must have privileges to dump all required schemas.
DB_PASSWORD yes Password for DB_USER.
BACKUP_INTERVAL no 24h ISO-8601 duration (PT24H) or simple format (24h, 12h, 30m). Drives the scheduler.
BACKUP_NAME_PREFIX no backup Prefix for backup IDs and filenames.
RETENTION_DAYS no 7 Backups older than this are pruned, except the most recent successful one which is always kept.
STORAGE_BACKEND yes s3 or local.
BACKUP_LOCAL_DIR when STORAGE_BACKEND=local Mount point for local backups.
S3_BUCKET when STORAGE_BACKEND=s3 Target bucket.
S3_PREFIX no backups Key prefix inside the bucket.
S3_ENDPOINT no (AWS default) Custom S3 endpoint for MinIO, Hetzner Object Storage, etc.
AWS_ACCESS_KEY_ID when STORAGE_BACKEND=s3 S3 access key.
AWS_SECRET_ACCESS_KEY when STORAGE_BACKEND=s3 S3 secret key.
AWS_DEFAULT_REGION no eu-central-1 S3 region.
API_TOKEN yes Static bearer token for the REST API. Generate a strong random value. Rotation requires a container restart.
HTTP_PORT no 8080 Port the REST API listens on.
LOG_FORMAT no json json or plain.

Missing required variables cause the container to fail at startup with a clear, actionable error message.

REST API

All API endpoints are versioned under /api/v1/. Every endpoint except /health and /info requires the Authorization: Bearer <API_TOKEN> header. Mismatched or missing tokens return 401 Unauthorized.

Single-flight invariant: at most one backup runs at any time. The scheduler and the API share the same lock. Concurrent triggers are rejected with 409 Conflict, and the response body contains the running job's ID so the caller can poll its status.

Endpoint summary

Method Path Purpose Status codes
POST /api/v1/backups Trigger a new backup. 202 (new job), 409 (running job ID), 401
GET /api/v1/backups/jobs/{jobId} Job status. 200, 401, 404
GET /api/v1/backups List all available backups, newest first. 200, 401
GET /api/v1/backups/latest Metadata of the latest successful backup. 200, 401, 404 (no successful backup yet)
GET /api/v1/backups/{id}/download Download a specific backup as application/gzip. 200, 401, 404
GET /api/v1/backups/latest/download Download the latest successful backup. 200, 401, 404
GET /health Liveness + readiness, no auth. 200, 503
GET /info Service version, PG client version, retention config, last-successful-backup age, no auth. 200

Examples

# Trigger a backup (returns 202 with the new job, or 409 if one is in flight)
curl -X POST -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups

# Poll a specific job
curl -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups/jobs/<jobId>

# List all backups (newest first)
curl -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups

# Metadata of the latest successful backup
curl -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups/latest

# Download a specific backup (preserves filename via -OJ)
curl -OJ -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups/<id>/download

# Download the latest backup
curl -OJ -H "Authorization: Bearer $API_TOKEN" \
     http://localhost:8080/api/v1/backups/latest/download

# Health (no auth)
curl http://localhost:8080/health

# Info (no auth)
curl http://localhost:8080/info

Job model

{
  "jobId": "9b7c4a8e-...",
  "status": "queued | running | succeeded | failed",
  "triggeredBy": "scheduler | api",
  "startedAt": "2026-05-10T01:00:00Z",
  "finishedAt": "2026-05-10T01:01:14Z",
  "durationMs": 74123,
  "errorMessage": null,
  "backupId": "backup_20260510T010000Z.sql.gz"
}

Jobs are kept in memory only — the most recent 100 with FIFO eviction. Jobs are lost on container restart. Long-term history comes from the backup listing.

Backup metadata model

{
  "id": "backup_20260510T010000Z.sql.gz",
  "createdAt": "2026-05-10T01:01:14Z",
  "sizeBytes": 8421376,
  "sha256": "fa3c…",
  "pgVersion": "17.2",
  "durationMs": 74123,
  "triggeredBy": "scheduler | api"
}

Restore

Restore is intentionally not exposed via the REST API — that path is too easy to misuse catastrophically. Instead, a bundled db-restore CLI is shipped inside the container and is invoked by an operator via docker exec:

# List available backups (newest first)
docker exec <container> db-restore

# Restore a specific backup (5-second abort window — Ctrl-C to cancel)
docker exec <container> db-restore backup_20260510T010000Z.sql.gz

# Restore without the abort window (for automation)
docker exec <container> db-restore --force backup_20260510T010000Z.sql.gz

The script drops and recreates the target database, then restores via gunzip -c <file> | psql --single-transaction. The --single-transaction flag ensures the entire restore is atomic: any error rolls the whole import back, leaving the database empty (freshly recreated) instead of half-imported. The operator can then choose a different backup.

The script reads connection info and storage configuration from the same environment variables as the Spring application. For the S3 backend it downloads through the running service's authenticated REST API instead of embedding AWS SigV4 — so the running service must be healthy for db-restore to work on the S3 backend.

Security & network exposure

The service is designed for internal-network use only. Do not expose it directly to the public Internet.

Specifically:

  • No TLS. The HTTP server is plain HTTP. If you need TLS, terminate it in an upstream reverse-proxy (nginx, Traefik, an ingress controller) on the same internal network and configure the proxy to forward to the service.
  • No rate limiting, no CORS, no IP allow-list, no WAF. These are the upstream reverse-proxy's job.
  • Single static bearer token. API_TOKEN is the only credential. Generate it with a strong random source (openssl rand -hex 32 or equivalent). Token rotation requires a container restart — there is no in-band rotation API.
  • No per-user authorisation. The service has no concept of users. Per-user authentication, role checks, and audit logging are the calling application's responsibility. Typically a backend application proxies frontend calls to the backup service and adds its own user-aware authorisation layer in front of the bearer token.
  • /health and /info are unauthenticated. They reveal the configured retention days, the configured backup interval, the bundled pg_dump version, and the age of the last successful backup — designed for monitoring tools on the same trusted network. Do not expose them publicly.

If you need to expose the API beyond your internal network, the only supported pattern is: put a reverse-proxy in front that adds TLS, rate limiting, and whatever additional authentication layer your environment requires. The backup service itself will not grow these concerns — they belong upstream.

Backup format

  • Tool: pg_dump from the PostgreSQL 17 client suite, piped through gzip. PostgreSQL clients are forward-compatible to older server versions back to 9.2, so the same image backs up servers from 9.2 through 17.
  • Format: plain SQL, gzipped (.sql.gz). Restorable with stock psql even without this service.
  • Flags: --no-owner --no-privileges for portability across environments with different role names.
  • Consistency: the default pg_dump snapshot mode (REPEATABLE READ). Each backup is a self-consistent point-in-time snapshot.
  • Validation: each finalised dump is read back through GZIPInputStream and its SHA-256 is recorded in the sidecar JSON. A failed validation marks the job as failed and discards the partial upload.

Each successful backup produces two storage objects:

<prefix>/backup_20260510T010000Z.sql.gz       ← the dump
<prefix>/backup_20260510T010000Z.sql.gz.meta.json  ← sidecar metadata

The sidecar JSON is written only after the dump is fully uploaded and verified. Listings ignore dumps without a sidecar, so a partially uploaded backup (e.g. interrupted by a container restart) never appears as "latest".

Observability

  • Logs. Structured JSON by default, one event per line. Switchable to plain text via LOG_FORMAT=plain. Contextual fields (jobId, backupId, durationMs, triggeredBy) are attached via MDC for the duration of a backup job.
  • /health. Spring Boot Actuator endpoint, unauthenticated. Returns 200 only when both the PostgreSQL server is reachable on the configured host/port and the storage backend is reachable (S3 HeadBucket or a writable local directory). Returns 503 otherwise. Suitable as a Kubernetes readiness/liveness probe.
  • /info. Unauthenticated. Reports the service version, the bundled pg_dump version, the configured retention days, the configured backup interval (in both ISO-8601 and seconds), and backup.lastSuccessfulBackupAgeSeconds — a long, or null when no successful backup exists yet. Alert on lastSuccessfulBackupAgeSeconds > <SLO> to detect backup staleness in your monitoring stack.

Development

Prerequisites

  • Java 21 (LTS). We recommend installing it via SDKMAN! or Eclipse Adoptium / Temurin.
  • Maven 3.9+.
  • Docker. Required for the Testcontainers-based integration tests and for building the container image.

Build from source

mvn verify

This compiles the project, runs the unit and integration test suites, and produces an executable Spring Boot fat JAR at target/db-backup-service.jar. CI (.github/workflows/ci.yml) runs exactly this command on every push and pull request.

Build the container image locally

docker build -f docker/Dockerfile -t db-backup-service:dev .

The image is based on eclipse-temurin:21-jre and bundles the PostgreSQL 17 client tools (pg_dump, pg_restore, psql), gzip, bash, curl, and jq. The bundled db-restore CLI lives at /usr/local/bin/db-restore.

Spec-driven workflow

Non-trivial changes go through a small spec before implementation. The spec folder lives at specs/<NNN>-<short-description>/ and contains a design.md and a behaviors.md (and optionally a steps.md). See specs/INDEX.md for the catalogue of existing specs and .claude/conventions/spec-driven-development.md for the convention itself.

Contributing

See CONTRIBUTING.md for development setup, the spec-driven workflow, commit message conventions, and the pull-request review process. Participation in the project is conditional on accepting our Code of Conduct.

Vulnerability reports are handled via SECURITY.md.

License

Apache License 2.0. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors