diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 0000000..2e4f1b4 --- /dev/null +++ b/.dockerignore @@ -0,0 +1,7 @@ +.git +.github +web/node_modules +.superpowers +docs +*.test +relay diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 1e747d9..54165e9 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -91,3 +91,12 @@ jobs: # Fail if the committed web/dist is stale vs a fresh build of the source. - name: Verify committed dist is in sync run: git diff --exit-code -- dist + + docker: + name: docker build + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Build image + run: docker build -t relay:ci . diff --git a/CLAUDE.md b/CLAUDE.md index 6a0bac1..f626c3d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -10,12 +10,13 @@ project: the point is to *prove understanding of queue internals*, not to wrap a library. Do not introduce a queue dependency (BullMQ, asynq, Machinery, Celery, etc.) — the mechanics are the deliverable. -**Status: Phase 1 complete; Phase 2 complete; Phase 3 in progress — 3a (HTTP API + server) ✅, -3b (dashboard) ✅, 3c (producer SDK) ✅ done.** The core engine plus delayed jobs, the promoter, -retry backoff, priority, idempotency enforcement, per-queue rate limiting, Prometheus metrics, the -JSON REST API + server, the embedded React dashboard, and the stdlib-only HTTP producer SDK are -built, tested against a real Redis under `-race`, and CI is green. Only packaging/deploy (3d) -remains. Repo: . What exists today: +**Status: Phases 1–3 complete.** 3a (HTTP API + server) ✅, 3b (dashboard) ✅, 3c (producer SDK) +✅, 3d (packaging/deploy/README) ✅. The core engine plus delayed jobs, the promoter, retry +backoff, priority, idempotency enforcement, per-queue rate limiting, Prometheus metrics, the JSON +REST API + server, the embedded React dashboard, the stdlib-only HTTP producer SDK, and the +Docker/Compose packaging are all built, tested against a real Redis under `-race`, and CI is +green. Only "Future work" items (Postgres SKIP LOCKED mode, exactly-once outbox) remain, which +were always out of scope. Repo: . What exists today: - `internal/job` — the `Job` model + Redis-hash encoding (`ToHash`/`FromHash`). - `internal/broker` — `Enqueue` (with `WithDelay`/`WithReadyAt`/`WithPriority`/`WithIdempotencyKey` options), atomic `Claim`, `Ack`, @@ -59,10 +60,18 @@ remains. Repo: . What exists today: `cmd/server` with SPA index.html fallback). Includes vitest unit tests for pure logic (format helpers, series builders) and a snapshot test. `web/` has its own `package.json`; the Go module gains no dependency. +- `Dockerfile` — multi-stage distroless image; builds all three binaries (`cmd/server`, + `cmd/worker`, `cmd/demo`) into one shared image (compose tags it `relay:local`). +- `.dockerignore` — trims the Docker build context (excludes `.git`, `web/node_modules`, + `.superpowers`, `docs`); keeps `web/dist` so the server can embed it. +- `deployments/docker-compose.yml` — redis + server + worker (1 by default, scale with + `--scale worker=N`) + one-shot demo; `docker compose -f deployments/docker-compose.yml up --build` + brings up a fully working end-to-end stack (dashboard at `/`, `/healthz`, `/metrics` all + functional). +- `README.md` — portfolio front page with Mermaid architecture diagram, feature list, quickstart + (native + Docker), and deploy notes. - `.github/workflows/ci.yml` — Redis service + `go test -race` + `golangci-lint` + dashboard - build/typecheck/test/dist-sync check. - -Packaging/deploy (3d) is **not** built yet. + build/typecheck/test/dist-sync check + `docker build` job. ## Source of truth @@ -106,6 +115,7 @@ spec disagree, the spec wins until the spec is deliberately updated. - **Committed `web/dist` must be rebuilt on UI change.** The Go binary embeds the committed dist; CI has a `git diff --exit-code -- dist` step to catch stale builds. Run `cd web && npm run build` and commit the updated dist whenever source changes. - **Producer SDK does no client-side retries.** `internal/client` makes one HTTP request per call; transient failures are surfaced as errors. The caller is responsible for retry logic (with backoff) if needed. - **`cmd/demo` requires a running `cmd/server`.** The demo load generator now produces jobs through the HTTP SDK (`-server` flag) and no longer talks to Redis directly. Running `cmd/demo` without `cmd/server` will produce connection errors immediately. +- **Docker/Compose packaging notes.** The compose Redis has no volume mount — data is ephemeral and lost on `docker compose down`. The `demo` service is one-shot (exits 0 after enqueuing; `restart: on-failure` lets it retry through the brief server-startup race); workers and server continue running. The distroless image has no shell (`/bin/sh` is absent), so `docker exec` interactive debugging is not available. Deploying to a live environment (Railway, Fly.io, etc.) is the operator's step; the compose stack is a local demo, not a production-hardened deployment. ## Redis data model & job lifecycle (the architecture in brief) @@ -162,8 +172,11 @@ internal/metrics/ # ✅ Prometheus Recorder + DepthCollector internal/api/ # ✅ JSON REST API handler (Phase 3a) internal/client/ # ✅ stdlib-only HTTP producer SDK (Phase 3c) web/ # ✅ Vite+React dashboard + web/embed.go (Phase 3b) -deployments/docker-compose.yml # ◻ redis + server + N workers + demo (Phase 3d) -.github/workflows/ci.yml # ✅ Redis service + go test -race + golangci-lint + dashboard CI +Dockerfile # ✅ multi-stage distroless image (Phase 3d) +.dockerignore # ✅ trims the Docker build context (Phase 3d) +deployments/docker-compose.yml # ✅ redis + server + N workers + demo (Phase 3d) +README.md # ✅ portfolio front page with diagram + quickstart (Phase 3d) +.github/workflows/ci.yml # ✅ Redis service + go test -race + golangci-lint + dashboard CI + docker build ``` Use `internal/` for everything not meant as a public import surface. `cmd/` holds only thin @@ -173,7 +186,7 @@ Use `internal/` for everything not meant as a public import surface. `cmd/` hold 1. **Phase 1 — core: ✅ done.** job model; enqueue/claim/ack/nack Lua; reaper; worker runtime; basic DLQ; integration tests; CI. A working, testable queue ships first. 2. **Phase 2 — depth: ✅ done.** delayed jobs + promoter ✅; backoff + jitter ✅; priority ✅; idempotency ✅; rate limiting ✅; Prometheus metrics ✅. -3. **Phase 3 — polish (in progress):** 3a HTTP API + server ✅; 3b dashboard ✅; 3c producer SDK (`internal/client`) ✅; 3d docker-compose + deployed demo + README diagram. +3. **Phase 3 — polish: ✅ done.** 3a HTTP API + server ✅; 3b dashboard ✅; 3c producer SDK (`internal/client`) ✅; 3d Dockerfile + docker-compose + README ✅. 4. **Future work (NOT now):** Postgres-backed (`SKIP LOCKED`) mode; exactly-once via consumer outbox. ## Conventions @@ -223,6 +236,8 @@ go run ./cmd/demo -server http://localhost:8080 -queue demo -count 100 # enqu # frontend dev/test (requires Node 20+): cd web && npm ci && npm run typecheck && npm run test && npm run build -``` -Keep this section updated as the Makefile / docker-compose take shape. +# Docker quickstart (all-in-one): +docker compose -f deployments/docker-compose.yml up --build +# then open http://localhost:8080 — dashboard, /healthz, and /metrics all available +``` diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..0769c0f --- /dev/null +++ b/Dockerfile @@ -0,0 +1,26 @@ +# syntax=docker/dockerfile:1 + +# --- builder --- +# Pinned to match the toolchain in go.mod (toolchain go1.25.11). +FROM golang:1.25.11 AS build +WORKDIR /src + +# Cache module downloads. +COPY go.mod go.sum ./ +RUN go mod download + +# Build all three binaries. web/dist is committed, so the server embeds the +# dashboard with no Node step. +COPY . . +RUN CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/server ./cmd/server \ + && CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/worker ./cmd/worker \ + && CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/demo ./cmd/demo + +# --- runtime --- +FROM gcr.io/distroless/static:nonroot +COPY --from=build /out/server /usr/local/bin/server +COPY --from=build /out/worker /usr/local/bin/worker +COPY --from=build /out/demo /usr/local/bin/demo +EXPOSE 8080 +# Each compose service overrides `command`; default to the server. +CMD ["/usr/local/bin/server"] diff --git a/README.md b/README.md index f582b96..566f692 100644 --- a/README.md +++ b/README.md @@ -2,4 +2,126 @@ [![CI](https://github.com/StrangeNoob/relay/actions/workflows/ci.yml/badge.svg)](https://github.com/StrangeNoob/relay/actions/workflows/ci.yml) -Distributed task queue in Go (Redis-backed). Design spec in `docs/superpowers/specs/`. +A distributed task queue built **from scratch on Redis primitives**, in Go. The point of this +project is to prove understanding of queue internals — the atomic claim, visibility timeouts, the +reaper, retries, priority, idempotency, rate limiting — rather than to wrap an existing library. + +## Architecture + +```mermaid +flowchart LR + subgraph producers[Producers] + SDK["internal/client (Go SDK)"] + DEMO["cmd/demo (load gen)"] + end + SRV["cmd/server
HTTP API · /metrics · /healthz
embedded dashboard"] + DASH["Dashboard (web/, embedded)"] + subgraph pool["cmd/worker (competing consumers)"] + CLAIM["claim loop → handler"] + REAP["reaper"] + PROM["promoter"] + end + RDS[("Redis
queues + job hashes")] + + SDK -->|"POST /api/queues/{q}/jobs"| SRV + DEMO -->|HTTP| SRV + SRV -->|enqueue · stats · DLQ · requeue| RDS + SRV -. serves .-> DASH + DASH -->|SSE + REST| SRV + CLAIM -->|atomic claim / ack / nack| RDS + REAP -->|requeue expired in-flight| RDS + PROM -->|promote due delayed| RDS +``` + +Producers enqueue over HTTP (or the Go SDK); the server is a thin JSON layer over the broker and +also serves the live dashboard and Prometheus metrics. Workers are competing consumers that claim +jobs atomically, run a handler, and ack/nack; two background loops (reaper, promoter) plus an +operator requeue are the only other things that move jobs between states. Redis is the durable +substrate — every queue guarantee is enforced by our own logic and embedded Lua scripts. + +## Delivery semantics & invariants + +- **At-least-once delivery, never exactly-once.** Idempotency keys let consumers dedup; nothing here + claims exactly-once. +- **The atomic claim is sacred.** Popping a job from `ready`, adding it to `inflight` under a + visibility deadline, and bumping attempts is a single Lua script — competing consumers can never + claim the same job. +- **Crash safety comes from the reaper.** A worker dying mid-job is recovered because its visibility + deadline expires and the reaper requeues the job. +- **Built from scratch on Redis primitives.** The only Go dependencies are a Redis driver and the + Prometheus client; the queue logic is ours. + +## Features + +Competing consumers · priority queues · delayed/scheduled jobs · retries with full-jitter backoff · +dead-letter queue with inspect + requeue · visibility timeout + reaper · idempotency keys · per-queue +rate limiting (token bucket) · Prometheus metrics · live dashboard · a producer SDK. + +## Quickstart (Docker) + +```bash +docker compose -f deployments/docker-compose.yml up --build +``` + +Then open . The `demo` container enqueues 200 jobs; the worker processes them +(failing ~10% so you get retries and a dead-letter queue to watch). The dashboard shows live queue +depth, throughput, and the DLQ — click **Requeue** on a dead job to send it back. Scale the workers: + +```bash +docker compose -f deployments/docker-compose.yml up --build --scale worker=3 +``` + +Generate more load any time: + +```bash +docker compose -f deployments/docker-compose.yml run --rm demo \ + /usr/local/bin/demo -server http://server:8080 -queue demo -count 500 +``` + +## Local development + +Needs Go 1.24+ and a Redis on `localhost:6379` (tests skip when none is reachable). + +```bash +go run ./cmd/server -queues demo # API + dashboard on :8080 +go run ./cmd/worker -queue demo -concurrency 4 +go run ./cmd/demo -server http://localhost:8080 -queue demo -count 100 + +go test -race ./... # broker/worker/api/client tests use real Redis +golangci-lint run +``` + +The dashboard lives in `web/` (Vite + React + TypeScript); rebuild it with `cd web && npm ci && npm run build` (the built `web/dist` is committed and embedded into the server). + +## Project layout + +``` +cmd/{server,worker,demo} # thin entrypoints +internal/job # job model + Redis-hash encoding +internal/broker # the engine: enqueue/claim/ack/nack/reap/promote + Lua scripts +internal/worker # consumer runtime (claim loop, reaper, promoter) +internal/metrics # Prometheus recorder + depth collector +internal/api # HTTP JSON API + SSE stream +internal/client # producer SDK (stdlib-only HTTP client) +web/ # embedded dashboard (Vite + React + TS) +deployments/ # docker-compose stack +``` + +## Deploy + +The image is a self-contained binary set, so any container host works. Example (Fly.io-style): + +1. Provision a managed Redis and note its address. +2. Build and push the image: `docker build -t /relay:latest . && docker push /relay:latest`. +3. Run the **server** (`/usr/local/bin/server -addr :8080 -redis -queues `), + exposing port 8080, and one or more **workers** + (`/usr/local/bin/worker -redis -queue `), pointed at the same Redis. +4. Point producers at the server's URL (the Go SDK, or `cmd/demo -server `). + +There is no auth — put it behind your platform's access controls if exposed publicly. + +## Design docs + +The authoritative designs live in [`docs/superpowers/specs/`](docs/superpowers/specs/); the base +design is the source of truth for architecture and delivery semantics. `CLAUDE.md` summarizes the +data model, invariants, and known limitations. diff --git a/deployments/docker-compose.yml b/deployments/docker-compose.yml new file mode 100644 index 0000000..07fe551 --- /dev/null +++ b/deployments/docker-compose.yml @@ -0,0 +1,52 @@ +# Runs the whole Relay system end-to-end: Redis, the API/dashboard server, a pool +# of workers (competing consumers + reaper + promoter), and a one-shot demo load +# generator. Open http://localhost:8080 after `up` to watch the dashboard. +# +# docker compose -f deployments/docker-compose.yml up --build +# docker compose -f deployments/docker-compose.yml up --build --scale worker=3 + +services: + redis: + image: redis:7 + healthcheck: + test: ["CMD", "redis-cli", "ping"] + interval: 3s + timeout: 3s + retries: 10 + + server: + build: + context: .. + dockerfile: Dockerfile + image: relay:local + command: ["/usr/local/bin/server", "-addr", ":8080", "-redis", "redis:6379", "-queues", "demo"] + ports: + - "8080:8080" + depends_on: + redis: + condition: service_healthy + + worker: + build: + context: .. + dockerfile: Dockerfile + image: relay:local + command: ["/usr/local/bin/worker", "-redis", "redis:6379", "-queue", "demo", "-concurrency", "4", "-fail-rate", "0.1"] + depends_on: + redis: + condition: service_healthy + + demo: + build: + context: .. + dockerfile: Dockerfile + image: relay:local + command: ["/usr/local/bin/demo", "-server", "http://server:8080", "-queue", "demo", "-count", "200"] + depends_on: + server: + condition: service_started + # depends_on only waits for the server container to start, not for its port to + # bind. on-failure lets the one-shot demo retry through that brief startup race + # (the distroless image has no shell/curl for a proper HTTP healthcheck); once + # it succeeds it exits 0 and is not restarted. + restart: on-failure diff --git a/docs/superpowers/plans/2026-06-09-relay-phase3d-packaging-deploy-readme.md b/docs/superpowers/plans/2026-06-09-relay-phase3d-packaging-deploy-readme.md new file mode 100644 index 0000000..842c603 --- /dev/null +++ b/docs/superpowers/plans/2026-06-09-relay-phase3d-packaging-deploy-readme.md @@ -0,0 +1,433 @@ +# Phase 3d Packaging, Deploy & README Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Make Relay runnable and presentable in one step — a multi-stage Dockerfile, a docker-compose stack that runs the whole system end-to-end, the portfolio README with a mermaid architecture diagram, and a CI image-build job. + +**Architecture:** One multi-stage `Dockerfile` builds all three Go binaries into a tiny distroless image (the server embeds the committed `web/dist`, so no Node step). `deployments/docker-compose.yml` runs redis + server + scalable workers + a one-shot demo from that shared image. The README is the front door; CI gains a `docker build` job so the Dockerfile can't rot. No queue logic changes. + +**Tech Stack:** Docker (multi-stage, distroless/static:nonroot), Docker Compose, GitHub Actions, Markdown + mermaid. Go 1.24 / toolchain go1.25.11. + +**Spec:** [`docs/superpowers/specs/2026-06-09-relay-phase3d-packaging-deploy-readme-design.md`](../specs/2026-06-09-relay-phase3d-packaging-deploy-readme-design.md) + +**Environment note:** Docker (29.x) and Compose (v5) are available in the dev environment, so the build/compose validation steps are real. If Docker is somehow unavailable at execution time, report BLOCKED for the validation steps rather than skipping silently. + +--- + +## File Structure + +- **Create `.dockerignore`** — keep the build context lean (exclude `.git`, `web/node_modules`, `.superpowers`, `docs`). +- **Create `Dockerfile`** (repo root) — multi-stage: golang builder → distroless runtime with all three binaries. +- **Create `deployments/docker-compose.yml`** — redis + server + worker(s) + demo. +- **Modify `.github/workflows/ci.yml`** — add a `docker` build job. +- **Create `README.md`** — the portfolio front page (mermaid diagram, quickstart, features, invariants, deploy, design-docs pointer). +- **Modify `CLAUDE.md`** — mark Phase 3 complete. + +--- + +## Task 1: `.dockerignore` + `Dockerfile` + +**Files:** Create `.dockerignore`, `Dockerfile` + +- [ ] **Step 1: Create `.dockerignore`** + +``` +.git +.github +web/node_modules +.superpowers +docs +*.test +relay +``` + +(Excluding `web/node_modules` is the big win; `docs`/`.superpowers` keep the context small. `web/dist` is NOT excluded — the server embeds it.) + +- [ ] **Step 2: Create `Dockerfile`** + +```dockerfile +# syntax=docker/dockerfile:1 + +# --- builder --- +FROM golang:1.25 AS build +WORKDIR /src + +# Cache module downloads. +COPY go.mod go.sum ./ +RUN go mod download + +# Build all three binaries. web/dist is committed, so the server embeds the +# dashboard with no Node step. +COPY . . +RUN CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/server ./cmd/server \ + && CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/worker ./cmd/worker \ + && CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/demo ./cmd/demo + +# --- runtime --- +FROM gcr.io/distroless/static:nonroot +COPY --from=build /out/server /usr/local/bin/server +COPY --from=build /out/worker /usr/local/bin/worker +COPY --from=build /out/demo /usr/local/bin/demo +EXPOSE 8080 +# Each compose service overrides `command`; default to the server. +CMD ["/usr/local/bin/server"] +``` + +- [ ] **Step 3: Build the image (validation)** + +Run: +```bash +cd /Users/leon/WorkSpace/relay # or the worktree root +docker build -t relay:ci . +``` +Expected: build succeeds. Then verify the binaries exist in the image: +```bash +docker run --rm --entrypoint /usr/local/bin/server relay:ci -h 2>&1 | head -3 || true +docker image inspect relay:ci >/dev/null && echo "image OK" +``` +Expected: `image OK` (the `-h` may exit non-zero printing flag usage — that's fine; it proves the binary runs). If `docker build` fails because `golang:1.25` cannot satisfy the `toolchain go1.25.11` pin, change the builder base to a tag that does (e.g. `golang:1.25.11`) and note the change. + +- [ ] **Step 4: Commit** + +```bash +git add .dockerignore Dockerfile +git commit -m "Add multi-stage Dockerfile building server/worker/demo into a distroless image" +``` + +--- + +## Task 2: `deployments/docker-compose.yml` + +**Files:** Create `deployments/docker-compose.yml` + +- [ ] **Step 1: Create the compose file** + +```yaml +# Runs the whole Relay system end-to-end: Redis, the API/dashboard server, a pool +# of workers (competing consumers + reaper + promoter), and a one-shot demo load +# generator. Open http://localhost:8080 after `up` to watch the dashboard. +# +# docker compose -f deployments/docker-compose.yml up --build +# docker compose -f deployments/docker-compose.yml up --build --scale worker=3 +# +services: + redis: + image: redis:7 + healthcheck: + test: ["CMD", "redis-cli", "ping"] + interval: 3s + timeout: 3s + retries: 10 + + server: + build: + context: .. + dockerfile: Dockerfile + image: relay:local + command: ["/usr/local/bin/server", "-addr", ":8080", "-redis", "redis:6379", "-queues", "demo"] + ports: + - "8080:8080" + depends_on: + redis: + condition: service_healthy + + worker: + build: + context: .. + dockerfile: Dockerfile + image: relay:local + command: ["/usr/local/bin/worker", "-redis", "redis:6379", "-queue", "demo", "-concurrency", "4", "-fail-rate", "0.1"] + depends_on: + redis: + condition: service_healthy + + demo: + build: + context: .. + dockerfile: Dockerfile + image: relay:local + command: ["/usr/local/bin/demo", "-server", "http://server:8080", "-queue", "demo", "-count", "200"] + depends_on: + server: + condition: service_started + restart: "no" +``` + +(All three app services share one `image: relay:local`, so the build runs once. `worker` has no `container_name` or published port, so `--scale worker=N` works.) + +- [ ] **Step 2: Validate the compose file** + +Run: +```bash +docker compose -f deployments/docker-compose.yml config >/dev/null && echo "compose config OK" +``` +Expected: `compose config OK` (no schema errors). + +- [ ] **Step 3: End-to-end bring-up (validation)** + +Run: +```bash +docker compose -f deployments/docker-compose.yml up --build -d redis server worker +# wait for the server to be reachable +for i in $(seq 1 30); do curl -fsS localhost:8080/healthz >/dev/null 2>&1 && break; sleep 1; done +curl -s localhost:8080/healthz; echo # ok +docker compose -f deployments/docker-compose.yml run --rm demo # enqueue 200 jobs +sleep 3 +curl -s localhost:8080/api/queues; echo # ["demo"] +curl -s localhost:8080/api/queues/demo/stats; echo # non-zero activity +curl -s -o /dev/null -w "%{http_code}\n" localhost:8080/ # 200 (dashboard) +curl -s localhost:8080/metrics | grep -c '^relay_' # >0 relay_ metric lines +docker compose -f deployments/docker-compose.yml down -v +``` +Expected: `ok`; `["demo"]`; stats with non-zero ready/processed/dlq movement; `200`; a positive metrics count. If Docker is unavailable, report BLOCKED. + +- [ ] **Step 4: Commit** + +```bash +git add deployments/docker-compose.yml +git commit -m "Add docker-compose stack: redis + server + workers + demo" +``` + +--- + +## Task 3: CI `docker` build job + +**Files:** Modify `.github/workflows/ci.yml` + +- [ ] **Step 1: Add the job** + +Append under `jobs:` in `.github/workflows/ci.yml` (same indentation as the existing `test`/`lint`/`web` jobs): + +```yaml + docker: + name: docker build + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Build image + run: docker build -t relay:ci . +``` + +Leave the existing `test`, `lint`, and `web` jobs unchanged. + +- [ ] **Step 2: Validate the workflow YAML** + +Run: +```bash +python3 -c "import yaml,sys; yaml.safe_load(open('.github/workflows/ci.yml')); print('ci.yml OK')" +``` +Expected: `ci.yml OK` (valid YAML; the new job parses). If `python3`/pyyaml is unavailable, instead run `docker build -t relay:ci .` locally to confirm the command the job runs works. + +- [ ] **Step 3: Commit** + +```bash +git add .github/workflows/ci.yml +git commit -m "Add CI job that builds the Docker image" +``` + +--- + +## Task 4: `README.md` + +**Files:** Create `README.md` + +- [ ] **Step 1: Create the README** + +Create `README.md` with this content (adjust prose lightly if needed, but keep the structure, the mermaid block, and the commands exact): + +````markdown +# Relay + +[![CI](https://github.com/StrangeNoob/relay/actions/workflows/ci.yml/badge.svg)](https://github.com/StrangeNoob/relay/actions/workflows/ci.yml) + +A distributed task queue built **from scratch on Redis primitives**, in Go. The point of this +project is to prove understanding of queue internals — the atomic claim, visibility timeouts, the +reaper, retries, priority, idempotency, rate limiting — rather than to wrap an existing library. + +## Architecture + +```mermaid +flowchart LR + subgraph producers[Producers] + SDK["internal/client (Go SDK)"] + DEMO["cmd/demo (load gen)"] + end + SRV["cmd/server
HTTP API · /metrics · /healthz
embedded dashboard"] + DASH["Dashboard (web/, embedded)"] + subgraph pool["cmd/worker (competing consumers)"] + CLAIM["claim loop → handler"] + REAP["reaper"] + PROM["promoter"] + end + RDS[("Redis
queues + job hashes")] + + SDK -->|"POST /api/queues/{q}/jobs"| SRV + DEMO -->|HTTP| SRV + SRV -->|enqueue · stats · DLQ · requeue| RDS + SRV -. serves .-> DASH + DASH -->|SSE + REST| SRV + CLAIM -->|atomic claim / ack / nack| RDS + REAP -->|requeue expired in-flight| RDS + PROM -->|promote due delayed| RDS +``` + +Producers enqueue over HTTP (or the Go SDK); the server is a thin JSON layer over the broker and +also serves the live dashboard and Prometheus metrics. Workers are competing consumers that claim +jobs atomically, run a handler, and ack/nack; two background loops (reaper, promoter) plus an +operator requeue are the only other things that move jobs between states. Redis is the durable +substrate — every queue guarantee is enforced by our own logic and embedded Lua scripts. + +## Delivery semantics & invariants + +- **At-least-once delivery, never exactly-once.** Idempotency keys let consumers dedup; nothing here + claims exactly-once. +- **The atomic claim is sacred.** Popping a job from `ready`, adding it to `inflight` under a + visibility deadline, and bumping attempts is a single Lua script — competing consumers can never + claim the same job. +- **Crash safety comes from the reaper.** A worker dying mid-job is recovered because its visibility + deadline expires and the reaper requeues the job. +- **Built from scratch on Redis primitives.** The only Go dependencies are a Redis driver and the + Prometheus client; the queue logic is ours. + +## Features + +Competing consumers · priority queues · delayed/scheduled jobs · retries with full-jitter backoff · +dead-letter queue with inspect + requeue · visibility timeout + reaper · idempotency keys · per-queue +rate limiting (token bucket) · Prometheus metrics · live dashboard · a producer SDK. + +## Quickstart (Docker) + +```bash +docker compose -f deployments/docker-compose.yml up --build +``` + +Then open . The `demo` container enqueues 200 jobs; the worker processes them +(failing ~10% so you get retries and a dead-letter queue to watch). The dashboard shows live queue +depth, throughput, and the DLQ — click **Requeue** on a dead job to send it back. Scale the workers: + +```bash +docker compose -f deployments/docker-compose.yml up --build --scale worker=3 +``` + +Generate more load any time: + +```bash +docker compose -f deployments/docker-compose.yml run --rm demo \ + /usr/local/bin/demo -server http://server:8080 -queue demo -count 500 +``` + +## Local development + +Needs Go 1.24+ and a Redis on `localhost:6379` (tests skip when none is reachable). + +```bash +go run ./cmd/server -queues demo # API + dashboard on :8080 +go run ./cmd/worker -queue demo -concurrency 4 +go run ./cmd/demo -server http://localhost:8080 -queue demo -count 100 + +go test -race ./... # broker/worker/api/client tests use real Redis +golangci-lint run +``` + +The dashboard lives in `web/` (Vite + React + TypeScript); rebuild it with `cd web && npm ci && npm run build` (the built `web/dist` is committed and embedded into the server). + +## Project layout + +``` +cmd/{server,worker,demo} # thin entrypoints +internal/job # job model + Redis-hash encoding +internal/broker # the engine: enqueue/claim/ack/nack/reap/promote + Lua scripts +internal/worker # consumer runtime (claim loop, reaper, promoter) +internal/metrics # Prometheus recorder + depth collector +internal/api # HTTP JSON API + SSE stream +internal/client # producer SDK (stdlib-only HTTP client) +web/ # embedded dashboard (Vite + React + TS) +deployments/ # docker-compose stack +``` + +## Deploy + +The image is a self-contained binary set, so any container host works. Example (Fly.io-style): + +1. Provision a managed Redis and note its address. +2. Build and push the image: `docker build -t /relay:latest . && docker push /relay:latest`. +3. Run the **server** (`/usr/local/bin/server -addr :8080 -redis -queues `), + exposing port 8080, and one or more **workers** + (`/usr/local/bin/worker -redis -queue `), pointed at the same Redis. +4. Point producers at the server's URL (the Go SDK, or `cmd/demo -server `). + +There is no auth — put it behind your platform's access controls if exposed publicly. + +## Design docs + +The authoritative designs live in [`docs/superpowers/specs/`](docs/superpowers/specs/); the base +design is the source of truth for architecture and delivery semantics. `CLAUDE.md` summarizes the +data model, invariants, and known limitations. +```` + +- [ ] **Step 2: Sanity-check the README** + +Run: +```bash +test -f README.md && echo "README OK" +grep -c '```mermaid' README.md # expect 1 +grep -c '```' README.md # expect an even number of fences +``` +Expected: `README OK`, `1`, and an even fence count (balanced code blocks). Eyeball the mermaid block for syntax (one `flowchart LR`, matched `subgraph`/`end`). + +- [ ] **Step 3: Commit** + +```bash +git add README.md +git commit -m "Add portfolio README with architecture diagram and quickstart" +``` + +--- + +## Task 5: Update CLAUDE.md and final verification + +**Files:** Modify `CLAUDE.md` + +- [ ] **Step 1: Update CLAUDE.md** + +Make these edits (match the file's wording/structure): +1. **Status line** — Phase 3 is now **complete**: 3a ✅, 3b ✅, 3c ✅, 3d ✅. The project's planned scope (Phases 1–3) is done; only the "Future work" items (Postgres mode, exactly-once outbox) remain, which were always out of scope. +2. **"What exists today" list** — add: `Dockerfile` (multi-stage, distroless), `deployments/docker-compose.yml` (redis + server + workers + demo), and `README.md` (portfolio front page). Note the CI `docker build` job. +3. **Layout (✅/◻)** — mark `deployments/docker-compose.yml` ✅ and add `Dockerfile` ✅, `README.md` ✅. There should be no remaining ◻ Phase-3 items. +4. **Build order** — Phase 3: 3a ✅, 3b ✅, 3c ✅, 3d ✅ (packaging/deploy/README done). Phase 3 complete. +5. **Build & dependencies / run commands** — add the docker quickstart: `docker compose -f deployments/docker-compose.yml up --build` → open `http://localhost:8080`. +6. **Known limitations** — add a packaging note: the compose Redis is ephemeral (no volume); the `demo` service is one-shot; the distroless image has no shell; live hosting is the operator's step. + +Keep claims accurate; do not contradict invariants. + +- [ ] **Step 2: Full verification** + +Run: +```bash +go build ./... +go test -race ./... +go vet ./... +gofmt -l internal/ cmd/ +docker build -t relay:ci . +docker compose -f deployments/docker-compose.yml config >/dev/null && echo "compose OK" +``` +Expected: Go build/tests/vet/fmt clean (broker DB 15, worker DB 14, metrics DB 13, api DB 12, client DB 11; needs Redis on :6379); docker image builds; compose validates. If Docker is unavailable, report which steps were skipped. + +If anything fails, STOP and report. + +- [ ] **Step 3: Commit** + +```bash +git add CLAUDE.md +git commit -m "Document Phase 3d: packaging, deploy, README (Phase 3 complete)" +``` + +--- + +## Self-Review (completed during planning) + +- **Spec coverage:** `.dockerignore` + multi-stage `Dockerfile` (Task 1); `deployments/docker-compose.yml` with redis/server/worker/demo + end-to-end validation (Task 2); CI `docker build` job (Task 3); portfolio `README.md` with the mermaid diagram, quickstart, features, invariants, deploy, design-docs pointer (Task 4); CLAUDE.md → Phase 3 complete + final verification (Task 5). Every spec section maps to a task. +- **Consistency:** compose `command`s use the exact binary paths from the Dockerfile (`/usr/local/bin/{server,worker,demo}`) and the real flags (`server -addr/-redis/-queues`, `worker -redis/-queue/-concurrency/-fail-rate`, `demo -server/-queue/-count`); the shared `image: relay:local` builds once; `build.context: ..` is correct because the compose file is in `deployments/` while the Dockerfile is at the repo root; README commands match the compose file. +- **No placeholders:** every file's full content is given; validation steps use concrete commands with expected output. +- **No queue-logic changes:** this phase is packaging + docs only; the Go module and its tests are unchanged, so the existing suite must stay green. diff --git a/docs/superpowers/specs/2026-06-09-relay-phase3d-packaging-deploy-readme-design.md b/docs/superpowers/specs/2026-06-09-relay-phase3d-packaging-deploy-readme-design.md new file mode 100644 index 0000000..b5ad0a1 --- /dev/null +++ b/docs/superpowers/specs/2026-06-09-relay-phase3d-packaging-deploy-readme-design.md @@ -0,0 +1,160 @@ +# Relay — Phase 3d: Packaging, Deploy & README + +**Status:** Approved design · **Date:** 2026-06-09 +**Parent spec:** [`2026-06-07-relay-distributed-task-queue-design.md`](2026-06-07-relay-distributed-task-queue-design.md) +**Depends on:** 3a HTTP API ✅, 3b dashboard ✅, 3c producer SDK ✅ +**Phase:** 3 (polish) — fourth and final sub-project. Completing it closes the project's planned scope. + +## Purpose + +Make Relay runnable and presentable in one step: a multi-stage Dockerfile, a docker-compose stack +that runs the whole system end-to-end (Redis + server + workers + a load-generating demo), and the +portfolio README — the front door that explains what Relay is, shows its architecture, and tells a +visitor how to run it. The compose stack is the runnable "demo"; live hosting is documented but +remains the operator's manual step. + +## Scope + +In scope: + +- `Dockerfile` (multi-stage) building all three binaries into one minimal image. +- `.dockerignore` to keep the build context lean. +- `deployments/docker-compose.yml` running redis + server + worker(s) + demo end-to-end. +- `README.md`: the project front page, including a mermaid architecture diagram, quickstart, local + dev, feature list, invariants, a deploy section, and a link to the design docs. +- A CI job that builds the Docker image (so the Dockerfile cannot rot). + +Out of scope: actually deploying to a live host (documented, but the operator runs it — no +credentials here); a LICENSE file (the author's IP choice); Kubernetes manifests; TLS/auth (the API +is demo-grade); pushing images to a registry. + +## Key decisions + +| Decision | Choice | Rationale | +|---|---|---| +| Image | **One multi-stage Dockerfile, shared image** | All three binaries from one build; compose services run the right one via `command`. The server embeds the committed `web/dist`, so no Node step is needed. | +| Runtime base | **`gcr.io/distroless/static:nonroot`** | Tiny, no shell/package surface, ideal for `CGO_ENABLED=0` static Go binaries; runs as non-root. | +| Demo | **docker-compose stack is the demo** | `docker compose up` brings up the full system and the demo container generates load; the dashboard shows it live. Reproducible, no secrets. | +| Deploy | **Documented in README (Fly.io-style); operator runs it** | A live URL needs a hosting account/credentials not available here; the image + compose make deployment straightforward, and the README gives concrete steps. | +| Diagram | **Mermaid in the README** | Renders natively on GitHub, lives as diffable text, no binary asset to drift. | +| Dockerfile CI | **Add a `docker build` job** | Keeps the Dockerfile working as the code evolves; build only (no push), so no registry creds. | + +## Components & changes + +### `Dockerfile` (repo root, multi-stage) + +```dockerfile +# Builder +FROM golang:1.25 AS build +WORKDIR /src +COPY go.mod go.sum ./ +RUN go mod download +COPY . . +# web/dist is committed, so the server embeds the dashboard with no Node step. +RUN CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/server ./cmd/server \ + && CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/worker ./cmd/worker \ + && CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/demo ./cmd/demo + +# Runtime +FROM gcr.io/distroless/static:nonroot +COPY --from=build /out/server /usr/local/bin/server +COPY --from=build /out/worker /usr/local/bin/worker +COPY --from=build /out/demo /usr/local/bin/demo +EXPOSE 8080 +# No fixed entrypoint binary: each compose service sets `command`. Default to the server. +CMD ["/usr/local/bin/server"] +``` + +The exact Go base tag should match `go.mod` (`go 1.24` / `toolchain go1.25.11`); use a `golang:1.25` +tag that resolves the pinned toolchain (the plan verifies the build works against the available +image). + +### `.dockerignore` + +Excludes at least: `.git`, `web/node_modules`, `.superpowers`, `docs`, `*.md` is **not** excluded +(README is fine to copy, it is small), local build artifacts. The key win is excluding +`web/node_modules` (large) and `.git`. + +### `deployments/docker-compose.yml` + +Services (all but redis built from the root `Dockerfile`): + +- **redis** — `redis:7`, healthcheck `redis-cli ping` (interval/timeout/retries), so dependents wait + for readiness. +- **server** — `command: ["/usr/local/bin/server", "-addr", ":8080", "-redis", "redis:6379", + "-queues", "demo"]`; `ports: ["8080:8080"]`; `depends_on: redis (service_healthy)`. +- **worker** — `command: ["/usr/local/bin/worker", "-redis", "redis:6379", "-queue", "demo", + "-concurrency", "4", "-fail-rate", "0.1"]`; `depends_on: redis (service_healthy)`. Scalable with + `docker compose up --scale worker=N` (the worker is a competing consumer; multiple are safe). +- **demo** — `command: ["/usr/local/bin/demo", "-server", "http://server:8080", "-queue", "demo", + "-count", "200"]`; `depends_on: server (service_started)`; `restart: "no"` (one-shot load). + +Topology matches the system: the demo (SDK over HTTP) and server talk to the API; the server and +workers talk to Redis; workers are the competing consumers plus the reaper/promoter loops. The 10% +fail-rate produces retries and DLQ entries so the dashboard's DLQ + requeue are demonstrable. + +### `README.md` + +Sections, in order: + +1. **Title + tagline + CI badge** — "Relay — a distributed task queue built from scratch on Redis, + in Go." Badge points at the CI workflow. +2. **What it is** — portfolio/back-end showcase; the point is proving queue internals, not wrapping a + library. +3. **Architecture** — a mermaid diagram: producer / SDK / demo → server (HTTP API + embedded + dashboard + `/metrics`); server and workers ↔ Redis; workers run the claim loop + reaper + + promoter. A short prose walk-through follows. +4. **Delivery semantics & invariants** — at-least-once (never exactly-once); the atomic claim is + sacred (one Lua script); crash safety via the reaper (visibility deadline); build-from-scratch on + Redis primitives. +5. **Features** — competing consumers, priority, delayed/scheduled, retries with full-jitter backoff, + DLQ + inspect/requeue, visibility timeout + reaper, idempotency keys, per-queue rate limiting, + Prometheus metrics, live dashboard, producer SDK. +6. **Quickstart** — `docker compose -f deployments/docker-compose.yml up --build`, then open + `http://localhost:8080`; what you'll see (depth/throughput, DLQ to requeue); scale workers. +7. **Local development** — `go run ./cmd/server` / `./cmd/worker` / `./cmd/demo`; tests need Redis on + `:6379`, run `go test -race ./...`; the dashboard is `web/` (Vite) with `npm` for changes. +8. **Project layout** — brief map of `internal/{job,broker,worker,metrics,api,client}`, `cmd/*`, + `web/`, `deployments/`. +9. **Deploy** — container-host instructions (Fly.io-style: build/push the image, set `REDIS_ADDR`, + run server + worker; point at a managed Redis). Noted as the operator's step. +10. **Design docs** — pointer to `docs/superpowers/specs/` (the authoritative designs) and a note + that the spec is the source of truth. + +### CI (`.github/workflows/ci.yml`) + +Add a `docker` job: checkout → `docker build -t relay:ci .` (build only, no push). Keeps the +Dockerfile honest. The existing `test`, `lint`, and `web` jobs are unchanged. + +## Testing / validation + +Docker is available in the dev environment, so validation is real (not just review): + +- `docker build -t relay:ci .` succeeds and produces all three binaries in the image. +- `docker compose -f deployments/docker-compose.yml config` validates the compose file. +- `docker compose ... up` brings the stack up: redis healthy → server up → workers consuming → demo + enqueues 200 jobs. Then assert: `curl localhost:8080/healthz` → `ok`; + `curl localhost:8080/api/queues` includes `demo`; `curl localhost:8080/api/queues/demo/stats` + shows non-zero activity; `curl localhost:8080/` returns the dashboard HTML; + `curl localhost:8080/metrics` has `relay_*` lines. Tear down with `docker compose down -v`. +- README: prose reviewed; mermaid block syntax sanity-checked (fenced ```mermaid```); links resolve. +- `go build ./...` and `go test -race ./...` remain green (no Go source changes in this phase beyond + none expected; if the demo/server need a tweak for container DNS it is minimal). + +## Invariants preserved + +- No queue logic changes — this phase is packaging and docs. At-least-once, the atomic claim, and + reaper crash-safety are untouched. +- No new Go dependency. The Docker image is built from the existing module; the README/compose add no + code dependencies. + +## Known limitations + +- **No live hosting in-repo.** The compose stack is the demo; deploying to a public URL is the + operator's step (documented). +- **Demo image runs as a one-shot.** The `demo` service exits after enqueuing; re-run with + `docker compose run --rm demo ...` to generate more load. +- **distroless runtime has no shell.** Debugging inside the container needs `docker compose exec` + alternatives (or a temporary alpine base); chosen deliberately for a minimal image. +- **Single Redis, no persistence tuning.** The compose Redis is ephemeral (no volume by default); + fine for a demo, not a production data store.