Skip to content

Phase 2: Prometheus metrics#5

Merged
StrangeNoob merged 19 commits into
mainfrom
phase2-metrics
Jun 8, 2026
Merged

Phase 2: Prometheus metrics#5
StrangeNoob merged 19 commits into
mainfrom
phase2-metrics

Conversation

@StrangeNoob

Copy link
Copy Markdown
Owner

Summary

Completes Phase 2 by making the engine observable. Adds opt-in Prometheus instrumentation for every job-state transition plus live per-queue depth gauges, with a /metrics endpoint on cmd/worker. No Lua scripts change; the atomic claim and at-least-once delivery are untouched — metrics are pure observation.

  • internal/broker — a small Metrics interface (consumer-side contract) with a noopMetrics default and a WithMetrics(m) option. Enqueue/Claim/Ack/Nack/Reap/Promote record after the Redis op succeeds (enqueued, deduplicated, claimed, processed, retried, dead, reaped, promoted, end-to-end latency). Off by default → behaviour byte-identical when unused.
  • internal/metrics (new package) — a Prometheus Recorder implementing broker.Metrics over a private registry (counters relay_jobs_*_total, histogram relay_job_latency_seconds, all labelled by queue), and a pull-based DepthCollector reporting relay_queue_depth{queue,state} via ZCARD/LLEN at scrape time (skip-on-error, never stale). The package does not import internal/broker — it satisfies the interface structurally, keeping the dependency arrow one-way.
  • cmd/worker — a --metrics-addr flag (default "" = off). When set, builds the recorder, registers the depth collector, and serves /metrics via promhttp, shutting the server down after the worker pool drains so the final scrape sees the last batch.
  • Dependency — adds github.com/prometheus/client_golang (an instrumentation library, not a queue library — the build-from-scratch rule is intact). CLAUDE.md updated to Phase 2 complete.

Design & plan

  • Spec: docs/superpowers/specs/2026-06-08-relay-phase2-metrics-design.md
  • Plan: docs/superpowers/plans/2026-06-08-relay-phase2-metrics.md

Test plan

  • go build ./... clean
  • go vet ./... clean
  • gofmt -l internal/ cmd/ clean
  • go test -race ./... — broker (DB 15), worker (DB 14), metrics (DB 13), job all pass against a real Redis (test DBs isolated so go test ./... is parallel-safe)
  • Instrumentation tests assert exact per-transition counts and that non-events record nothing
  • DepthCollector verified via testutil.CollectAndCompare; --metrics-addr smoke-tested serving relay_queue_depth

Replace the vacuous noopMetrics identity check with a sentinel
distinctMetrics pointer so TestWithMetricsInstallsRecorder actually
fails when WithMetrics does not install the recorder.

Also tighten the ObserveLatency doc comment: "enqueue -> ack" → "creation
-> ack" to match the implementation (latency is measured from job.CreatedAt,
set in job.New, not from the enqueue call).
Add fakeMetrics recorder and newTestBrokerWith helper for option-injecting
tests. Wire IncEnqueued and IncDeduplicated into broker.Enqueue so every
successful enqueue and every dropped duplicate is counted per queue.
nack.lua already returns "retry" or "dead" to indicate which branch was
taken. Switch Nack from .Err() to .Text() to capture that return value,
then call IncRetried/IncDead on the Metrics interface accordingly.

Tests (TestNackWithRetriesLeftRecordsRetried, TestNackWithBudgetSpentRecordsDead)
were written first and confirmed RED before the one-line behavior change made
them GREEN.
@StrangeNoob

Copy link
Copy Markdown
Owner Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@StrangeNoob StrangeNoob merged commit 8a91b86 into main Jun 8, 2026
2 checks passed
@StrangeNoob StrangeNoob deleted the phase2-metrics branch June 8, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant