feat(comptroller): real Workers-AI (T0) cost insights at GET /api/v1/insights#83
feat(comptroller): real Workers-AI (T0) cost insights at GET /api/v1/insights#83chitcommit wants to merge 3 commits into
Conversation
…t ledger Replace stubbed worker internals with real implementations: - getDb/getWriteDb helpers over porsager `postgres` driver on Hyperdrive connectionString (the old `env.NEON_COMPTROLLER.query()` was fictional). - pullCFAIGatewayAnalytics: real CF AI Gateway /logs ingest for the 4 active gateways with KV high-water dedup, bounded pagination, batch INSERT into chittyops.cost_ledger; tolerant per-gateway failure. - tierFromModel maps to CHECK-constraint-valid tiers (T0/T3_opus/T3_sonnet/ T2_haiku/manual) — validated on a Neon temp branch (caught a tier-CHECK bug). - detectAnomalies/isServiceExempt/budgetStatus/refreshCostLedgerView refactored to the real driver; matview refresh fail-soft on privilege. - /api/v1/metrics, fetchDailyReport, listAnomalies, checkHardCaps: real queries. - storeAnomalies/listAnomalies hit chittyops.anomalies (fixed stale comment). - signHmac: real HMAC-SHA256, fail-closed when key absent. - Notion + Quo emitters: real not-configured guards (Phase B), no fake content. - Cold-start + 14d baseline-learning KV state set on first run (safe-state). WRITER-CONNECTION BLOCKER (Phase A): the Hyperdrive binding is read-only (comptroller_reader). cost_ledger/anomalies writes require a SEPARATE RW Hyperdrive binding (NEON_COMPTROLLER_WRITER). Until provisioned, getWriteDb() fails closed and ingest is skipped (logged), so the poll never errors. Read path validated live: /api/v1/metrics returns real total_count=0 matching Neon. Both INSERT column lists schema-validated on a disposable Neon branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…B clients NEON_COMPTROLLER_WRITER Hyperdrive (4427ea04, comptroller_writer role, append-only INSERT on cost_ledger+anomalies). Refactor DB access to per-invocation postgres clients via AsyncLocalStorage scope, ending them with ctx.waitUntil to avoid stale Hyperdrive clients across cron isolate reuse. Verified live: cost_ledger 0 -> 1900+ rows across chittygateway + chittycounsel, real cost/token mapping (chittycounsel $0.063 captured). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…insights
Adds AI categorization + deeper insight grounded in real cost_ledger data,
fulfilling "mini opportunities for ai categorization/deeper insight" with
genuine AI rather than static COA mapping. SPEC-compliant: Workers-AI @cf/*
models are T0, so this NEVER uses an LLM above T0.
Design — SQL owns every number, the model owns only prose:
- queryInsightsAggregates() computes all figures in SQL (today/all-time spend,
per-service+tier+provider, 7-day daily trend, top models by cost and by
call volume, workers-ai vs external-provider split).
- runInsightsModel() feeds those finished figures to @cf/meta/llama-3.1-8b-instruct
ONCE (not per-row) and asks for narrative-only fields: per-service category +
characterization, cost drivers, trend/anomaly notes, 2-4 grounded recs. The
prompt forbids inventing/restating costs and editorializing magnitude.
- Numeric fields in the response come straight from the queries; only the prose
comes from the model — so "grounded, no fabrication" is structural, and the
figure cross-check trivially holds.
- response: {generated_at, window, totals, per_service[], drivers[], trends[],
recommendations[], daily_trend[], top_models_by_cost[], top_models_by_calls[],
model_used}. Parse failures surface raw model text (no fabricated fallback).
- Cached ~6h in KV_STATE (insights:{chicago-date}); ?refresh=1 bypasses. Never
runs on the 5-min poll (avoids meta-cost).
- Empty-state: zero rows in window returns a clear empty result, skips the model.
wrangler.toml: adds [ai] binding = "AI" (free, on-account, no new secret).
Verified live at comptroller.chitty.cc/api/v1/insights — figures match a direct
Neon query exactly (today $0.077455/2055 calls, all-time $0.238430/4471 rows,
qwen3-embedding $0.069217). Model correctly characterized chittycounsel as an
embedding-heavy workload and flagged the real 6/8→6/10 cost ramp.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@coderabbitai review |
|
Warning Review limit reached
More reviews will be available in 6 minutes and 50 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a new /api/v1/insights endpoint to the Comptroller Worker that computes cost/usage aggregates in SQL and uses Workers AI (T0) to generate narrative-only insight, while also introducing a new Hyperdrive write path and AI Gateway log ingestion into chittyops.cost_ledger.
Changes:
- Adds Workers AI binding +
/api/v1/insightsendpoint with KV caching and narrative-only model output. - Refactors Neon access to use per-invocation
postgresclients (AsyncLocalStorage + Hyperdrive connection strings), plus adds optional writer binding and ingestion/insert paths. - Introduces a standalone TypeScript package setup for the service (tsconfig, package.json, pnpm lock).
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| services/comptroller/wrangler.toml | Enables nodejs_compat, adds AI binding + Hyperdrive writer binding, adjusts routes/triggers. |
| services/comptroller/worker.ts | Adds per-invocation DB scoping, AI Gateway ingestion + inserts, and /api/v1/insights implementation. |
| services/comptroller/tsconfig.json | Adds strict TS config for the Worker package. |
| services/comptroller/package.json | Adds service-local dependencies/scripts (wrangler/tsc/postgres). |
| services/comptroller/pnpm-lock.yaml | Locks service-local dependency graph. |
| services/comptroller/node-async-hooks.d.ts | Adds minimal ambient typing for AsyncLocalStorage under nodejs_compat. |
Files not reviewed (1)
- services/comptroller/pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| routes = [ | ||
| { pattern = "comptroller.chitty.cc/*", custom_domain = true } | ||
| { pattern = "comptroller.chitty.cc", custom_domain = true } | ||
| ] |
| const rows = fresh.map((l) => ({ | ||
| service: gw, | ||
| tier: tierFromModel(l.model), | ||
| provider: l.provider ?? "unknown", | ||
| model: l.model ?? "unknown", | ||
| tokens_in: Math.round(l.tokens_in ?? 0), | ||
| tokens_out: Math.round(l.tokens_out ?? 0), | ||
| cached_tokens_in: Math.round(l.usage_metadata?.input_cached_tokens ?? 0), | ||
| cost_usd: Number(l.cost ?? 0), | ||
| latency_ms: Math.round(l.timings?.latency ?? 0), | ||
| item_id_hash: l.id, | ||
| run_id: null as string | null, | ||
| fallback_chain: null as string[] | null, | ||
| ts: l.created_at, | ||
| cost_constrained: false, | ||
| })); |
| const perService = (await db` | ||
| SELECT service, | ||
| coalesce(sum(cost_usd),0)::float8 AS cost_usd, | ||
| count(*)::int AS calls, | ||
| coalesce(sum(tokens_in),0)::bigint AS tokens_in, | ||
| coalesce(sum(tokens_out),0)::bigint AS tokens_out, | ||
| (array_agg(provider ORDER BY cost_usd DESC NULLS LAST))[1] AS top_provider, | ||
| (array_agg(tier ORDER BY cost_usd DESC NULLS LAST))[1] AS top_tier | ||
| FROM chittyops.cost_ledger | ||
| WHERE ts >= date_trunc('day', now() AT TIME ZONE 'America/Chicago') | ||
| GROUP BY service | ||
| ORDER BY cost_usd DESC | ||
| `) as any[]; |
| const modelsByCost = (await db` | ||
| SELECT model, (array_agg(provider))[1] AS provider, | ||
| coalesce(sum(cost_usd),0)::float8 AS cost_usd, count(*)::int AS calls | ||
| FROM chittyops.cost_ledger | ||
| WHERE ts >= date_trunc('day', now() AT TIME ZONE 'America/Chicago') - interval '6 days' | ||
| GROUP BY model | ||
| ORDER BY cost_usd DESC | ||
| LIMIT 5 | ||
| `) as any[]; |
| const modelsByCalls = (await db` | ||
| SELECT model, (array_agg(provider))[1] AS provider, | ||
| coalesce(sum(cost_usd),0)::float8 AS cost_usd, count(*)::int AS calls | ||
| FROM chittyops.cost_ledger | ||
| WHERE ts >= date_trunc('day', now() AT TIME ZONE 'America/Chicago') - interval '6 days' | ||
| GROUP BY model | ||
| ORDER BY calls DESC | ||
| LIMIT 5 | ||
| `) as any[]; |
| * Data layer: | ||
| * - READ : env.NEON_COMPTROLLER (Hyperdrive, comptroller_reader, read-only) → getDb(env) | ||
| * - WRITE : env.NEON_COMPTROLLER_WRITER (Hyperdrive over RW role) → getWriteDb(env) | ||
| * Both Hyperdrive bindings expose a `.connectionString`; we drive them with | ||
| * porsager `postgres` (works on Workers over Hyperdrive's TCP socket). | ||
| * getWriteDb() FAILS CLOSED if the writer binding is absent (Phase-A blocker). | ||
| */ |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e3011d873
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (maxSeen > hwmMs) { | ||
| await env.KV_STATE.put(hwmKey, new Date(maxSeen).toISOString()); |
There was a problem hiding this comment.
Do not advance the high-water mark past unprocessed pages
When a gateway has more than MAX_PAGES_PER_GATEWAY * LOGS_PER_PAGE new log rows, this loop stops after the page cap but still stores maxSeen from the newest fetched page. On the next poll, fresh only keeps rows with created_at > hwmMs, so every older log that was beyond the page cap is skipped permanently, undercounting cost_ledger after initial deploys, downtime, or high-volume bursts. Only advance the high-water mark once the pagination has reached the old high-water mark/end, or keep a cursor that does not discard the unprocessed tail.
Useful? React with 👍 / 👎.
| const enc = new TextEncoder(); | ||
| const a = enc.encode(presented); | ||
| const b = enc.encode(key); | ||
| if (a.byteLength !== b.byteLength) return false; |
There was a problem hiding this comment.
Avoid returning before constant-time comparison
For _admin/poll, any request with a bearer token length different from COMPTROLLER_HMAC_KEY returns before timingSafeEqual, so an attacker who can repeatedly time requests can learn the secret length even though this helper is documented as constant-time. Cloudflare’s own timingSafeEqual guidance warns not to return early on length mismatch; compare a same-length dummy/self buffer and then fail instead.
Useful? React with 👍 / 👎.
| const baseline = await env.KV_STATE.get(BASELINE_LEARNING_KEY); | ||
| if (!baseline) { | ||
| const until = new Date(Date.now() + BASELINE_LEARNING_DAYS * 24 * 3600 * 1000).toISOString(); | ||
| await env.KV_STATE.put(BASELINE_LEARNING_KEY, until); |
There was a problem hiding this comment.
Preserve manual baseline-learning end
When an operator uses /_admin/baseline_learning/end, it deletes this KV key, but the next scheduled run, status request, or admin poll calls ensureColdStartState() and treats the missing key as a fresh deploy by writing a new 14-day window. Since pollMetrics() suppresses L2/L3 while isBaselineLearningActive() is true, the documented manual override is immediately undone and Comptroller remains L1-only; store an initialized/disabled marker or set an expired timestamp instead of recreating after deletion.
Useful? React with 👍 / 👎.
| const anomalyCount = (await db` | ||
| SELECT count(*)::int AS n FROM chittyops.anomalies | ||
| WHERE detected_at >= date_trunc('day', now() AT TIME ZONE 'America/Chicago') | ||
| `) as any[]; |
There was a problem hiding this comment.
Add storage for the new anomalies query
I searched repo-wide for CREATE TABLE definitions for anomalies and only found references; the migrations create cost_ledger and pause_exemptions, but not chittyops.anomalies. On a fresh deploy following the checked-in migrations/runbook, /reports/daily now executes this unguarded query and returns 500, which also breaks the AGENTS deploy verification step for /reports/daily; add the table migration or fail-soft here like listAnomalies() does.
Useful? React with 👍 / 👎.
| await writeDb` | ||
| INSERT INTO chittyops.cost_ledger ${writeDb( |
There was a problem hiding this comment.
Make gateway log inserts idempotent
If the 5-minute cron overlaps with /_admin/poll or a previous long-running poll, both invocations can read the same KV high-water mark before either stores the new value and then execute this insert for the same Cloudflare log IDs. The cost ledger migration has no uniqueness constraint on item_id_hash, so those races double-count calls and cost; use a per-gateway lock/cursor or a unique key with ON CONFLICT DO NOTHING before enabling concurrent manual and scheduled ingest.
Useful? React with 👍 / 👎.
| console.error("[isServiceExempt] query failed:", e); | ||
| return false; // fail-safe: do not block on lookup failure for L2/L3 gate caller |
There was a problem hiding this comment.
Fail closed when exemption lookup fails
If the exemption query fails, this returns false, and emitL3Signal() treats a protected service as non-exempt and posts to its pause endpoint without SMS confirmation. This is likely in the checked-in setup because I found no migration granting comptroller_reader access to chittyops.pause_exemptions; the previous uncaught error would have aborted the pause path, while this catch turns permission/table outages into fail-open behavior against the AGENTS hard rule to never pause exempt services without explicit SMS confirm.
Useful? React with 👍 / 👎.
| tokens_out: Math.round(l.tokens_out ?? 0), | ||
| cached_tokens_in: Math.round(l.usage_metadata?.input_cached_tokens ?? 0), | ||
| cost_usd: Number(l.cost ?? 0), | ||
| latency_ms: Math.round(l.timings?.latency ?? 0), |
There was a problem hiding this comment.
Read latency from the gateway log duration
Cloudflare's List Gateway Logs response documents duration on each LogListResponse and does not include a timings.latency object, so every ingested row from this endpoint will store latency_ms as 0. That makes the materialized view's average and p95 latency fields unusable for dashboards/anomaly analysis even though the source API provides the value; map the documented duration field instead.
Useful? React with 👍 / 👎.
|
@claude resolve conflicts |
What
Adds a real AI-categorization + deeper-insight endpoint to the live ChittyComptroller worker, using Cloudflare Workers-AI (T0) — SPEC-compliant (
@cf/*models are T0; this NEVER uses an LLM above T0). Replaces reliance on static COA mapping with genuine AI insight grounded in realchittyops.cost_ledgerdata.Endpoint
GET /api/v1/insights(?refresh=1to bypass cache)Design — SQL owns every number, the model owns only prose
queryInsightsAggregates()computes all figures in SQL: today + all-time spend, per-service+tier+provider, 7-day daily trend, top models by cost and by call volume, and the workers-ai vs external-provider split.runInsightsModel()feeds those finished figures to@cf/meta/llama-3.1-8b-instructonce (not per-row) and asks for narrative-only fields. The prompt forbids inventing/restating costs or editorializing magnitude.narrative_error+model_raw) — no fabricated fallback.KV_STATE(insights:{chicago-date}); never runs on the 5-min poll (avoids meta-cost).Response shape
{generated_at, window, totals, per_service[], drivers[], trends[], recommendations[], daily_trend[], top_models_by_cost[], top_models_by_calls[], model_used}wrangler.toml
Adds
[ai] binding = "AI"— free, on-account, no new secret.Live verification
Deployed (version
f10db6b3-1aa9-4f06-9167-559a2615bb2b) and curled live atcomptroller.chitty.cc/api/v1/insights. Figures match a direct Neon query exactly:The model correctly characterized chittycounsel as an embedding-heavy workload (qwen3-embedding, high tokens_in / ~0 tokens_out) and flagged the real 6/8 → 6/10 cost ramp as the notable trend.
🤖 Generated with Claude Code