Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.git
.github
web/node_modules
.superpowers
docs
*.test
relay
9 changes: 9 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,12 @@ jobs:
# Fail if the committed web/dist is stale vs a fresh build of the source.
- name: Verify committed dist is in sync
run: git diff --exit-code -- dist

docker:
name: docker build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Build image
run: docker build -t relay:ci .
43 changes: 29 additions & 14 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ project: the point is to *prove understanding of queue internals*, not to wrap a
library. Do not introduce a queue dependency (BullMQ, asynq, Machinery, Celery, etc.) — the
mechanics are the deliverable.

**Status: Phase 1 complete; Phase 2 complete; Phase 3 in progress — 3a (HTTP API + server) ✅,
3b (dashboard) ✅, 3c (producer SDK) ✅ done.** The core engine plus delayed jobs, the promoter,
retry backoff, priority, idempotency enforcement, per-queue rate limiting, Prometheus metrics, the
JSON REST API + server, the embedded React dashboard, and the stdlib-only HTTP producer SDK are
built, tested against a real Redis under `-race`, and CI is green. Only packaging/deploy (3d)
remains. Repo: <https://github.com/StrangeNoob/relay>. What exists today:
**Status: Phases 1–3 complete.** 3a (HTTP API + server) ✅, 3b (dashboard) ✅, 3c (producer SDK)
✅, 3d (packaging/deploy/README) ✅. The core engine plus delayed jobs, the promoter, retry
backoff, priority, idempotency enforcement, per-queue rate limiting, Prometheus metrics, the JSON
REST API + server, the embedded React dashboard, the stdlib-only HTTP producer SDK, and the
Docker/Compose packaging are all built, tested against a real Redis under `-race`, and CI is
green. Only "Future work" items (Postgres SKIP LOCKED mode, exactly-once outbox) remain, which
were always out of scope. Repo: <https://github.com/StrangeNoob/relay>. What exists today:

- `internal/job` — the `Job` model + Redis-hash encoding (`ToHash`/`FromHash`).
- `internal/broker` — `Enqueue` (with `WithDelay`/`WithReadyAt`/`WithPriority`/`WithIdempotencyKey` options), atomic `Claim`, `Ack`,
Expand Down Expand Up @@ -59,10 +60,18 @@ remains. Repo: <https://github.com/StrangeNoob/relay>. What exists today:
`cmd/server` with SPA index.html fallback). Includes vitest unit tests for pure logic
(format helpers, series builders) and a snapshot test. `web/` has its own `package.json`; the
Go module gains no dependency.
- `Dockerfile` — multi-stage distroless image; builds all three binaries (`cmd/server`,
`cmd/worker`, `cmd/demo`) into one shared image (compose tags it `relay:local`).
- `.dockerignore` — trims the Docker build context (excludes `.git`, `web/node_modules`,
`.superpowers`, `docs`); keeps `web/dist` so the server can embed it.
- `deployments/docker-compose.yml` — redis + server + worker (1 by default, scale with
`--scale worker=N`) + one-shot demo; `docker compose -f deployments/docker-compose.yml up --build`
brings up a fully working end-to-end stack (dashboard at `/`, `/healthz`, `/metrics` all
functional).
- `README.md` — portfolio front page with Mermaid architecture diagram, feature list, quickstart
(native + Docker), and deploy notes.
- `.github/workflows/ci.yml` — Redis service + `go test -race` + `golangci-lint` + dashboard
build/typecheck/test/dist-sync check.

Packaging/deploy (3d) is **not** built yet.
build/typecheck/test/dist-sync check + `docker build` job.

## Source of truth

Expand Down Expand Up @@ -106,6 +115,7 @@ spec disagree, the spec wins until the spec is deliberately updated.
- **Committed `web/dist` must be rebuilt on UI change.** The Go binary embeds the committed dist; CI has a `git diff --exit-code -- dist` step to catch stale builds. Run `cd web && npm run build` and commit the updated dist whenever source changes.
- **Producer SDK does no client-side retries.** `internal/client` makes one HTTP request per call; transient failures are surfaced as errors. The caller is responsible for retry logic (with backoff) if needed.
- **`cmd/demo` requires a running `cmd/server`.** The demo load generator now produces jobs through the HTTP SDK (`-server` flag) and no longer talks to Redis directly. Running `cmd/demo` without `cmd/server` will produce connection errors immediately.
- **Docker/Compose packaging notes.** The compose Redis has no volume mount — data is ephemeral and lost on `docker compose down`. The `demo` service is one-shot (exits 0 after enqueuing; `restart: on-failure` lets it retry through the brief server-startup race); workers and server continue running. The distroless image has no shell (`/bin/sh` is absent), so `docker exec` interactive debugging is not available. Deploying to a live environment (Railway, Fly.io, etc.) is the operator's step; the compose stack is a local demo, not a production-hardened deployment.

## Redis data model & job lifecycle (the architecture in brief)

Expand Down Expand Up @@ -162,8 +172,11 @@ internal/metrics/ # ✅ Prometheus Recorder + DepthCollector
internal/api/ # ✅ JSON REST API handler (Phase 3a)
internal/client/ # ✅ stdlib-only HTTP producer SDK (Phase 3c)
web/ # ✅ Vite+React dashboard + web/embed.go (Phase 3b)
deployments/docker-compose.yml # ◻ redis + server + N workers + demo (Phase 3d)
.github/workflows/ci.yml # ✅ Redis service + go test -race + golangci-lint + dashboard CI
Dockerfile # ✅ multi-stage distroless image (Phase 3d)
.dockerignore # ✅ trims the Docker build context (Phase 3d)
deployments/docker-compose.yml # ✅ redis + server + N workers + demo (Phase 3d)
README.md # ✅ portfolio front page with diagram + quickstart (Phase 3d)
.github/workflows/ci.yml # ✅ Redis service + go test -race + golangci-lint + dashboard CI + docker build
```

Use `internal/` for everything not meant as a public import surface. `cmd/` holds only thin
Expand All @@ -173,7 +186,7 @@ Use `internal/` for everything not meant as a public import surface. `cmd/` hold

1. **Phase 1 — core: ✅ done.** job model; enqueue/claim/ack/nack Lua; reaper; worker runtime; basic DLQ; integration tests; CI. A working, testable queue ships first.
2. **Phase 2 — depth: ✅ done.** delayed jobs + promoter ✅; backoff + jitter ✅; priority ✅; idempotency ✅; rate limiting ✅; Prometheus metrics ✅.
3. **Phase 3 — polish (in progress):** 3a HTTP API + server ✅; 3b dashboard ✅; 3c producer SDK (`internal/client`) ✅; 3d docker-compose + deployed demo + README diagram.
3. **Phase 3 — polish: ✅ done.** 3a HTTP API + server ✅; 3b dashboard ✅; 3c producer SDK (`internal/client`) ✅; 3d Dockerfile + docker-compose + README .
4. **Future work (NOT now):** Postgres-backed (`SKIP LOCKED`) mode; exactly-once via consumer outbox.

## Conventions
Expand Down Expand Up @@ -223,6 +236,8 @@ go run ./cmd/demo -server http://localhost:8080 -queue demo -count 100 # enqu

# frontend dev/test (requires Node 20+):
cd web && npm ci && npm run typecheck && npm run test && npm run build
```

Keep this section updated as the Makefile / docker-compose take shape.
# Docker quickstart (all-in-one):
docker compose -f deployments/docker-compose.yml up --build
# then open http://localhost:8080 — dashboard, /healthz, and /metrics all available
```
26 changes: 26 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# syntax=docker/dockerfile:1

# --- builder ---
# Pinned to match the toolchain in go.mod (toolchain go1.25.11).
FROM golang:1.25.11 AS build
WORKDIR /src

# Cache module downloads.
COPY go.mod go.sum ./
RUN go mod download

# Build all three binaries. web/dist is committed, so the server embeds the
# dashboard with no Node step.
COPY . .
RUN CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/server ./cmd/server \
&& CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/worker ./cmd/worker \
&& CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/demo ./cmd/demo

# --- runtime ---
FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/server /usr/local/bin/server
COPY --from=build /out/worker /usr/local/bin/worker
COPY --from=build /out/demo /usr/local/bin/demo
EXPOSE 8080
# Each compose service overrides `command`; default to the server.
CMD ["/usr/local/bin/server"]
124 changes: 123 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,126 @@

[![CI](https://github.com/StrangeNoob/relay/actions/workflows/ci.yml/badge.svg)](https://github.com/StrangeNoob/relay/actions/workflows/ci.yml)

Distributed task queue in Go (Redis-backed). Design spec in `docs/superpowers/specs/`.
A distributed task queue built **from scratch on Redis primitives**, in Go. The point of this
project is to prove understanding of queue internals — the atomic claim, visibility timeouts, the
reaper, retries, priority, idempotency, rate limiting — rather than to wrap an existing library.

## Architecture

```mermaid
flowchart LR
subgraph producers[Producers]
SDK["internal/client (Go SDK)"]
DEMO["cmd/demo (load gen)"]
end
SRV["cmd/server<br/>HTTP API · /metrics · /healthz<br/>embedded dashboard"]
DASH["Dashboard (web/, embedded)"]
subgraph pool["cmd/worker (competing consumers)"]
CLAIM["claim loop → handler"]
REAP["reaper"]
PROM["promoter"]
end
RDS[("Redis<br/>queues + job hashes")]

SDK -->|"POST /api/queues/{q}/jobs"| SRV
DEMO -->|HTTP| SRV
SRV -->|enqueue · stats · DLQ · requeue| RDS
SRV -. serves .-> DASH
DASH -->|SSE + REST| SRV
CLAIM -->|atomic claim / ack / nack| RDS
REAP -->|requeue expired in-flight| RDS
PROM -->|promote due delayed| RDS
```

Producers enqueue over HTTP (or the Go SDK); the server is a thin JSON layer over the broker and
also serves the live dashboard and Prometheus metrics. Workers are competing consumers that claim
jobs atomically, run a handler, and ack/nack; two background loops (reaper, promoter) plus an
operator requeue are the only other things that move jobs between states. Redis is the durable
substrate — every queue guarantee is enforced by our own logic and embedded Lua scripts.

## Delivery semantics & invariants

- **At-least-once delivery, never exactly-once.** Idempotency keys let consumers dedup; nothing here
claims exactly-once.
- **The atomic claim is sacred.** Popping a job from `ready`, adding it to `inflight` under a
visibility deadline, and bumping attempts is a single Lua script — competing consumers can never
claim the same job.
- **Crash safety comes from the reaper.** A worker dying mid-job is recovered because its visibility
deadline expires and the reaper requeues the job.
- **Built from scratch on Redis primitives.** The only Go dependencies are a Redis driver and the
Prometheus client; the queue logic is ours.

## Features

Competing consumers · priority queues · delayed/scheduled jobs · retries with full-jitter backoff ·
dead-letter queue with inspect + requeue · visibility timeout + reaper · idempotency keys · per-queue
rate limiting (token bucket) · Prometheus metrics · live dashboard · a producer SDK.

## Quickstart (Docker)

```bash
docker compose -f deployments/docker-compose.yml up --build
```

Then open <http://localhost:8080>. The `demo` container enqueues 200 jobs; the worker processes them
(failing ~10% so you get retries and a dead-letter queue to watch). The dashboard shows live queue
depth, throughput, and the DLQ — click **Requeue** on a dead job to send it back. Scale the workers:

```bash
docker compose -f deployments/docker-compose.yml up --build --scale worker=3
```

Generate more load any time:

```bash
docker compose -f deployments/docker-compose.yml run --rm demo \
/usr/local/bin/demo -server http://server:8080 -queue demo -count 500
```

## Local development

Needs Go 1.24+ and a Redis on `localhost:6379` (tests skip when none is reachable).

```bash
go run ./cmd/server -queues demo # API + dashboard on :8080
go run ./cmd/worker -queue demo -concurrency 4
go run ./cmd/demo -server http://localhost:8080 -queue demo -count 100

go test -race ./... # broker/worker/api/client tests use real Redis
golangci-lint run
```

The dashboard lives in `web/` (Vite + React + TypeScript); rebuild it with `cd web && npm ci && npm run build` (the built `web/dist` is committed and embedded into the server).

## Project layout

```
cmd/{server,worker,demo} # thin entrypoints
internal/job # job model + Redis-hash encoding
internal/broker # the engine: enqueue/claim/ack/nack/reap/promote + Lua scripts
internal/worker # consumer runtime (claim loop, reaper, promoter)
internal/metrics # Prometheus recorder + depth collector
internal/api # HTTP JSON API + SSE stream
internal/client # producer SDK (stdlib-only HTTP client)
web/ # embedded dashboard (Vite + React + TS)
deployments/ # docker-compose stack
```

## Deploy

The image is a self-contained binary set, so any container host works. Example (Fly.io-style):

1. Provision a managed Redis and note its address.
2. Build and push the image: `docker build -t <registry>/relay:latest . && docker push <registry>/relay:latest`.
3. Run the **server** (`/usr/local/bin/server -addr :8080 -redis <redis-addr> -queues <queues>`),
exposing port 8080, and one or more **workers**
(`/usr/local/bin/worker -redis <redis-addr> -queue <queue>`), pointed at the same Redis.
4. Point producers at the server's URL (the Go SDK, or `cmd/demo -server <url>`).

There is no auth — put it behind your platform's access controls if exposed publicly.

## Design docs

The authoritative designs live in [`docs/superpowers/specs/`](docs/superpowers/specs/); the base
design is the source of truth for architecture and delivery semantics. `CLAUDE.md` summarizes the
data model, invariants, and known limitations.
52 changes: 52 additions & 0 deletions deployments/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Runs the whole Relay system end-to-end: Redis, the API/dashboard server, a pool
# of workers (competing consumers + reaper + promoter), and a one-shot demo load
# generator. Open http://localhost:8080 after `up` to watch the dashboard.
#
# docker compose -f deployments/docker-compose.yml up --build
# docker compose -f deployments/docker-compose.yml up --build --scale worker=3

services:
redis:
image: redis:7
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 3s
timeout: 3s
retries: 10

server:
build:
context: ..
dockerfile: Dockerfile
image: relay:local
command: ["/usr/local/bin/server", "-addr", ":8080", "-redis", "redis:6379", "-queues", "demo"]
ports:
- "8080:8080"
depends_on:
redis:
condition: service_healthy

worker:
build:
context: ..
dockerfile: Dockerfile
image: relay:local
command: ["/usr/local/bin/worker", "-redis", "redis:6379", "-queue", "demo", "-concurrency", "4", "-fail-rate", "0.1"]
depends_on:
redis:
condition: service_healthy

demo:
build:
context: ..
dockerfile: Dockerfile
image: relay:local
command: ["/usr/local/bin/demo", "-server", "http://server:8080", "-queue", "demo", "-count", "200"]
depends_on:
server:
condition: service_started
# depends_on only waits for the server container to start, not for its port to
# bind. on-failure lets the one-shot demo retry through that brief startup race
# (the distroless image has no shell/curl for a proper HTTP healthcheck); once
# it succeeds it exits 0 and is not restarted.
restart: on-failure
Loading
Loading