Release regression detection for engineering teams.
WatchDog answers one question after every deploy: did this release break something?
It correlates a deploy event with post-deploy changes in error rate, latency, and log signatures, then saves a triage-ready incident with evidence, notes, exports, and an explanation.
The hosted Vercel demo also runs an evidence-bounded triage agent automatically after each deploy event and persists the agent report to Supabase.
The intended user is an engineer, SRE, or engineering manager who needs to understand whether a release caused customer-facing risk without reading raw metrics and logs first.
- Solves a real production problem that every backend team understands
- Uses Rust for a low-overhead, always-on streaming process
- Demonstrates event correlation, rolling baselines, anomaly detection, and alerting
- Produces measurable benchmark output instead of vague claims
- Watches a metrics stream from a JSONL source in the MVP
- Accepts deploy notifications from a CLI command or deploy script hook
- Builds a rolling baseline from recent pre-deploy samples
- Runs CUSUM change detection on error rate and latency
- Attributes suspicious shifts and repeated new error signatures to a specific deploy
- Persists incident records with status, notes, explanation cache, and export endpoints
- Serves a product dashboard for incident review and deploy-event demos
- Runs autonomous deploy triage in the hosted demo: detect, explain, recommend, and store the audit trail
- Emits a human-readable verdict to stdout, webhook, and the dashboard
flowchart TD
A["Metric samples arrive"] --> B["Rolling baseline buffer"]
C["Deploy event arrives"] --> D["Snapshot pre-deploy baseline"]
D --> E["Open post-deploy monitoring window"]
B --> E
E --> F["Run CUSUM on error rate and latency"]
F --> G{"Regression detected?"}
G -- "No" --> H["Keep monitoring"]
G -- "Yes" --> I["Correlate metric shift to deploy timestamp"]
I --> J["Generate plain-English verdict"]
J --> K["Generate explanation and triage report"]
K --> L["Persist incident and audit trail"]
L --> M["Dashboard triage, notes, status, exports"]
src/app.rs: CLI entrypoints and runtime orchestrationsrc/engine.rs: deploy correlation state machinesrc/detector.rs: CUSUM-based change detectionsrc/buffer.rs: rolling metric baseline buffersrc/alert.rs: alert rendering and webhook deliverysrc/benchmark.rs: deterministic benchmark scenariossrc/dashboard.rs: hosted dashboard, health endpoint, incident APIs, and demo scenario triggersrc/storage.rs: durable incident persistence with Supabase, SQLite, and JSON-file modes
Create a ready-to-demo bad deploy incident:
cargo run -- demo
cargo run -- serve --state-dir .watchdog-demo --port 3001Open http://127.0.0.1:3001, then use the dashboard to:
- Run a checkout or payments deploy regression scenario from the sidebar
- Select the saved incident from history
- Generate or refresh the explanation
- Add investigation notes and mark the incident resolved
- Export Markdown or JSON for handoff
For hosted demos, see deploy/README.md:
- Vercel hosted product demo from
vercel-demo/with serverless APIs and Supabase persistence - Dockerized Rust dashboard for Render, Railway, Fly.io, or any Docker host
- Deployment notes for persistent state and lightweight explanations
The live dashboard exposes GET /healthz for deployment checks.
Environment examples are in .env.example.
Use SQLite-backed demo storage locally:
WATCHDOG_STORAGE=sqlite \
WATCHDOG_DATABASE_URL=.watchdog-demo/watchdog.sqlite \
WATCHDOG_EXPLAINER=local \
cargo run -- serve --state-dir .watchdog-demo --port 3001Use Supabase-backed storage:
WATCHDOG_STORAGE=supabase \
SUPABASE_URL=https://your-project.supabase.co \
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key \
WATCHDOG_EXPLAINER=local \
cargo run -- serve --state-dir .watchdog-demo --port 3001Create the Supabase table first:
create table if not exists incidents (
id text primary key,
created_at timestamptz not null,
severity text not null,
status text not null default 'open',
deploy_id text not null,
environment text not null,
summary text not null,
incident_json jsonb not null,
updated_at timestamptz not null default now()
);
create index if not exists idx_incidents_created_at
on incidents (created_at desc);
create index if not exists idx_incidents_status
on incidents (status);
create index if not exists idx_incidents_deploy_id
on incidents (deploy_id);Run the streaming synthetic bad deploy demo:
cargo run -- simulate --state-dir .WatchDog --deploy v1.4.2 --bad-deploy
cargo run -- run --state-dir .WatchDogRun with a JSON config file:
cargo run -- run --state-dir .WatchDog --config watchdog.config.jsonExample config:
{
"baseline_capacity": 120,
"monitoring_window_secs": 300,
"log_file": ".WatchDog/app.log",
"webhook_url": "https://hooks.example.test/watchdog",
"detector": {
"error_threshold": 0.08,
"error_drift": 0.002,
"latency_threshold": 120.0,
"latency_drift": 5.0
}
}CLI flags such as --log-file, --monitoring-window-secs, and --webhook-url override config file values.
Slack incoming webhook URLs get a richer alert payload with Block Kit sections for the regression summary, metric deltas, dominant error signature, and timeline. Other webhook URLs receive the plain text alert body.
The dashboard can explain an incident with Ollama or a built-in lightweight explainer.
By default, WATCHDOG_EXPLAINER=auto tries Ollama first and falls back to the local explainer if Ollama is not running, which keeps the demo flow reliable.
# Always use the built-in lightweight explainer
WATCHDOG_EXPLAINER=local cargo run -- serve --state-dir .watchdog-demo --port 3001
# Require Ollama instead of falling back locally
WATCHDOG_EXPLAINER=ollama WATCHDOG_OLLAMA_MODEL=gemma3 cargo run -- serve --state-dir .watchdog-demo --port 3001The local explainer uses the captured incident evidence only: deploy timing, metric deltas, dominant error signature, request rate, and baseline comparison.
- Real: Rust detection engine, CUSUM metric shift detection, deploy correlation, log signature extraction, Supabase/SQLite/JSON incident persistence, Vercel serverless APIs, notes/status updates, exports, health endpoint, explanation caching, and persisted triage-agent reports.
- Simulated for demo: JSONL metrics, deploy-event source, and log lines generated by
cargo run -- demo,cargo run -- simulate, or the hosted dashboard deploy buttons. - Replaceable in production: JSONL ingestion can be swapped for Prometheus/OpenTelemetry/webhook ingestion while keeping the detection, storage, and triage workflow.
flowchart LR
A["Demo user<br/>deploys checkout/payments"] --> B["Vercel frontend<br/>vercel-demo/index.html"]
B --> C["POST /api/deployments/start<br/>Vercel serverless API"]
C --> D["WatchDog deploy monitor<br/>baseline vs new release"]
D --> E["Evidence explanation<br/>deterministic incident summary"]
E --> F["Triage agent<br/>confidence, action, limits"]
F --> G["Supabase Postgres<br/>incidents.incident_json"]
G --> H["Dashboard detail<br/>history, notes, status, audit trail"]
H --> B
The hosted demo is autonomous after the deploy event: it detects the regression, generates the explanation, runs the triage agent, and stores the incident in Supabase. It deliberately does not auto-rollback production; rollback remains a human-approved action.
flowchart TD
A["Deploy event<br/>CLI, deploy hook, or hosted scenario"] --> B["WatchDog Engine"]
C["Metric samples<br/>JSONL demo stream"] --> B
D["Log events<br/>app.log / JSON lines"] --> B
B --> E["Rolling baseline buffer"]
B --> F["CUSUM detector"]
B --> G["Error signature extractor"]
E --> H["Regression verdict"]
F --> H
G --> H
H --> I["Storage adapter"]
I --> J["Supabase Postgres<br/>hosted cloud demo"]
I --> K["SQLite DB<br/>local/Docker demo"]
I --> P["Incident JSON files<br/>fallback mode"]
J --> L["Axum dashboard API"]
K --> L
P --> L
L --> M["Web console<br/>history, detail, notes, status"]
L --> N["Explain Incident<br/>local explainer or Ollama"]
L --> O["Exports<br/>Markdown / JSON"]
N --> I
M --> I
Record a real deploy event:
cargo run -- notify --state-dir .WatchDog --deploy v1.4.2 --environment productionRun benchmark scenarios:
cargo run -- benchmark --trials 100WatchDog benchmark summary
trials: 100
healthy false positives: 0
bad deploys detected: 100
bad deploys missed: 0
average detection latency: 4.00s
best detection latency: 4s
worst detection latency: 4s
This benchmark is deterministic and scoped to the built-in synthetic scenarios. It is a repo quality signal, not a universal production guarantee.
WatchDog reads and writes JSONL files inside the state directory:
metrics.jsonldeploy-events.jsonlwatchdog.sqlitewhenWATCHDOG_STORAGE=sqlite
Example metric sample:
{"timestamp":"2026-03-30T19:30:00Z","error_rate":0.02,"p95_latency_ms":190.0,"request_rate":1200.0}A tiny deploy hook is included at examples/deploy.sh. It shows how a deploy pipeline can notify WatchDog with one line.
- Prometheus or OpenTelemetry metrics ingestion
- Database-backed multi-tenant storage
- GitHub Actions or deploy-platform integration for automatic deploy notifications