Skip to content

feat(comptroller): real Workers-AI (T0) cost insights at GET /api/v1/insights#83

Open
chitcommit wants to merge 3 commits into
mainfrom
feat/comptroller-phase-a-cf-ai-gateway-ingest
Open

feat(comptroller): real Workers-AI (T0) cost insights at GET /api/v1/insights#83
chitcommit wants to merge 3 commits into
mainfrom
feat/comptroller-phase-a-cf-ai-gateway-ingest

Conversation

@chitcommit

Copy link
Copy Markdown
Contributor

What

Adds a real AI-categorization + deeper-insight endpoint to the live ChittyComptroller worker, using Cloudflare Workers-AI (T0) — SPEC-compliant (@cf/* models are T0; this NEVER uses an LLM above T0). Replaces reliance on static COA mapping with genuine AI insight grounded in real chittyops.cost_ledger data.

Endpoint

GET /api/v1/insights (?refresh=1 to bypass cache)

Design — SQL owns every number, the model owns only prose

  • queryInsightsAggregates() computes all figures in SQL: today + all-time spend, per-service+tier+provider, 7-day daily trend, top models by cost and by call volume, and the workers-ai vs external-provider split.
  • runInsightsModel() feeds those finished figures to @cf/meta/llama-3.1-8b-instruct once (not per-row) and asks for narrative-only fields. The prompt forbids inventing/restating costs or editorializing magnitude.
  • Numeric fields in the response come straight from the queries; only prose comes from the model → "grounded, no fabrication" is structural, not prompt-dependent.
  • Parse failures surface raw model text (narrative_error + model_raw) — no fabricated fallback.
  • Cached ~6h in KV_STATE (insights:{chicago-date}); never runs on the 5-min poll (avoids meta-cost).
  • Empty-state: zero rows returns a clear empty result and skips the model entirely.

Response shape

{generated_at, window, totals, per_service[], drivers[], trends[], recommendations[], daily_trend[], top_models_by_cost[], top_models_by_calls[], model_used}

wrangler.toml

Adds [ai] binding = "AI" — free, on-account, no new secret.

Live verification

Deployed (version f10db6b3-1aa9-4f06-9167-559a2615bb2b) and curled live at comptroller.chitty.cc/api/v1/insights. Figures match a direct Neon query exactly:

Figure Endpoint Neon
today cost 0.077455 0.077455
today calls 2055 2055
all-time cost 0.238430 0.238430
all-time rows 4471 4471
qwen3-embedding 7d cost 0.069217 0.069217

The model correctly characterized chittycounsel as an embedding-heavy workload (qwen3-embedding, high tokens_in / ~0 tokens_out) and flagged the real 6/8 → 6/10 cost ramp as the notable trend.

🤖 Generated with Claude Code

chitcommit and others added 3 commits June 11, 2026 00:14
…t ledger

Replace stubbed worker internals with real implementations:
- getDb/getWriteDb helpers over porsager `postgres` driver on Hyperdrive
  connectionString (the old `env.NEON_COMPTROLLER.query()` was fictional).
- pullCFAIGatewayAnalytics: real CF AI Gateway /logs ingest for the 4 active
  gateways with KV high-water dedup, bounded pagination, batch INSERT into
  chittyops.cost_ledger; tolerant per-gateway failure.
- tierFromModel maps to CHECK-constraint-valid tiers (T0/T3_opus/T3_sonnet/
  T2_haiku/manual) — validated on a Neon temp branch (caught a tier-CHECK bug).
- detectAnomalies/isServiceExempt/budgetStatus/refreshCostLedgerView refactored
  to the real driver; matview refresh fail-soft on privilege.
- /api/v1/metrics, fetchDailyReport, listAnomalies, checkHardCaps: real queries.
- storeAnomalies/listAnomalies hit chittyops.anomalies (fixed stale comment).
- signHmac: real HMAC-SHA256, fail-closed when key absent.
- Notion + Quo emitters: real not-configured guards (Phase B), no fake content.
- Cold-start + 14d baseline-learning KV state set on first run (safe-state).

WRITER-CONNECTION BLOCKER (Phase A): the Hyperdrive binding is read-only
(comptroller_reader). cost_ledger/anomalies writes require a SEPARATE RW
Hyperdrive binding (NEON_COMPTROLLER_WRITER). Until provisioned, getWriteDb()
fails closed and ingest is skipped (logged), so the poll never errors.

Read path validated live: /api/v1/metrics returns real total_count=0 matching
Neon. Both INSERT column lists schema-validated on a disposable Neon branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…B clients

NEON_COMPTROLLER_WRITER Hyperdrive (4427ea04, comptroller_writer role,
append-only INSERT on cost_ledger+anomalies). Refactor DB access to
per-invocation postgres clients via AsyncLocalStorage scope, ending them
with ctx.waitUntil to avoid stale Hyperdrive clients across cron isolate reuse.

Verified live: cost_ledger 0 -> 1900+ rows across chittygateway + chittycounsel,
real cost/token mapping (chittycounsel $0.063 captured).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…insights

Adds AI categorization + deeper insight grounded in real cost_ledger data,
fulfilling "mini opportunities for ai categorization/deeper insight" with
genuine AI rather than static COA mapping. SPEC-compliant: Workers-AI @cf/*
models are T0, so this NEVER uses an LLM above T0.

Design — SQL owns every number, the model owns only prose:
- queryInsightsAggregates() computes all figures in SQL (today/all-time spend,
  per-service+tier+provider, 7-day daily trend, top models by cost and by
  call volume, workers-ai vs external-provider split).
- runInsightsModel() feeds those finished figures to @cf/meta/llama-3.1-8b-instruct
  ONCE (not per-row) and asks for narrative-only fields: per-service category +
  characterization, cost drivers, trend/anomaly notes, 2-4 grounded recs. The
  prompt forbids inventing/restating costs and editorializing magnitude.
- Numeric fields in the response come straight from the queries; only the prose
  comes from the model — so "grounded, no fabrication" is structural, and the
  figure cross-check trivially holds.
- response: {generated_at, window, totals, per_service[], drivers[], trends[],
  recommendations[], daily_trend[], top_models_by_cost[], top_models_by_calls[],
  model_used}. Parse failures surface raw model text (no fabricated fallback).
- Cached ~6h in KV_STATE (insights:{chicago-date}); ?refresh=1 bypasses. Never
  runs on the 5-min poll (avoids meta-cost).
- Empty-state: zero rows in window returns a clear empty result, skips the model.

wrangler.toml: adds [ai] binding = "AI" (free, on-account, no new secret).

Verified live at comptroller.chitty.cc/api/v1/insights — figures match a direct
Neon query exactly (today $0.077455/2055 calls, all-time $0.238430/4471 rows,
qwen3-embedding $0.069217). Model correctly characterized chittycounsel as an
embedding-heavy workload and flagged the real 6/8→6/10 cost ramp.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 11, 2026 01:27
@github-actions

Copy link
Copy Markdown

@coderabbitai review
@copilot review
Adversarial review request: evaluate security, policy bypass paths, and regression risk.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@chitcommit, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 6 minutes and 50 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: add8a851-c4c0-426e-bf90-b962127c1ad6

📥 Commits

Reviewing files that changed from the base of the PR and between 7163222 and 7e3011d.

⛔ Files ignored due to path filters (1)
  • services/comptroller/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • services/comptroller/node-async-hooks.d.ts
  • services/comptroller/package.json
  • services/comptroller/tsconfig.json
  • services/comptroller/worker.ts
  • services/comptroller/wrangler.toml
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/comptroller-phase-a-cf-ai-gateway-ingest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new /api/v1/insights endpoint to the Comptroller Worker that computes cost/usage aggregates in SQL and uses Workers AI (T0) to generate narrative-only insight, while also introducing a new Hyperdrive write path and AI Gateway log ingestion into chittyops.cost_ledger.

Changes:

  • Adds Workers AI binding + /api/v1/insights endpoint with KV caching and narrative-only model output.
  • Refactors Neon access to use per-invocation postgres clients (AsyncLocalStorage + Hyperdrive connection strings), plus adds optional writer binding and ingestion/insert paths.
  • Introduces a standalone TypeScript package setup for the service (tsconfig, package.json, pnpm lock).

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
services/comptroller/wrangler.toml Enables nodejs_compat, adds AI binding + Hyperdrive writer binding, adjusts routes/triggers.
services/comptroller/worker.ts Adds per-invocation DB scoping, AI Gateway ingestion + inserts, and /api/v1/insights implementation.
services/comptroller/tsconfig.json Adds strict TS config for the Worker package.
services/comptroller/package.json Adds service-local dependencies/scripts (wrangler/tsc/postgres).
services/comptroller/pnpm-lock.yaml Locks service-local dependency graph.
services/comptroller/node-async-hooks.d.ts Adds minimal ambient typing for AsyncLocalStorage under nodejs_compat.
Files not reviewed (1)
  • services/comptroller/pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 10 to 12
routes = [
{ pattern = "comptroller.chitty.cc/*", custom_domain = true }
{ pattern = "comptroller.chitty.cc", custom_domain = true }
]
Comment on lines +428 to +443
const rows = fresh.map((l) => ({
service: gw,
tier: tierFromModel(l.model),
provider: l.provider ?? "unknown",
model: l.model ?? "unknown",
tokens_in: Math.round(l.tokens_in ?? 0),
tokens_out: Math.round(l.tokens_out ?? 0),
cached_tokens_in: Math.round(l.usage_metadata?.input_cached_tokens ?? 0),
cost_usd: Number(l.cost ?? 0),
latency_ms: Math.round(l.timings?.latency ?? 0),
item_id_hash: l.id,
run_id: null as string | null,
fallback_chain: null as string[] | null,
ts: l.created_at,
cost_constrained: false,
}));
Comment on lines +534 to +546
const perService = (await db`
SELECT service,
coalesce(sum(cost_usd),0)::float8 AS cost_usd,
count(*)::int AS calls,
coalesce(sum(tokens_in),0)::bigint AS tokens_in,
coalesce(sum(tokens_out),0)::bigint AS tokens_out,
(array_agg(provider ORDER BY cost_usd DESC NULLS LAST))[1] AS top_provider,
(array_agg(tier ORDER BY cost_usd DESC NULLS LAST))[1] AS top_tier
FROM chittyops.cost_ledger
WHERE ts >= date_trunc('day', now() AT TIME ZONE 'America/Chicago')
GROUP BY service
ORDER BY cost_usd DESC
`) as any[];
Comment on lines +558 to +566
const modelsByCost = (await db`
SELECT model, (array_agg(provider))[1] AS provider,
coalesce(sum(cost_usd),0)::float8 AS cost_usd, count(*)::int AS calls
FROM chittyops.cost_ledger
WHERE ts >= date_trunc('day', now() AT TIME ZONE 'America/Chicago') - interval '6 days'
GROUP BY model
ORDER BY cost_usd DESC
LIMIT 5
`) as any[];
Comment on lines +568 to +576
const modelsByCalls = (await db`
SELECT model, (array_agg(provider))[1] AS provider,
coalesce(sum(cost_usd),0)::float8 AS cost_usd, count(*)::int AS calls
FROM chittyops.cost_ledger
WHERE ts >= date_trunc('day', now() AT TIME ZONE 'America/Chicago') - interval '6 days'
GROUP BY model
ORDER BY calls DESC
LIMIT 5
`) as any[];
Comment on lines +12 to 18
* Data layer:
* - READ : env.NEON_COMPTROLLER (Hyperdrive, comptroller_reader, read-only) → getDb(env)
* - WRITE : env.NEON_COMPTROLLER_WRITER (Hyperdrive over RW role) → getWriteDb(env)
* Both Hyperdrive bindings expose a `.connectionString`; we drive them with
* porsager `postgres` (works on Workers over Hyperdrive's TCP socket).
* getWriteDb() FAILS CLOSED if the writer binding is absent (Phase-A blocker).
*/

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e3011d873

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +475 to +476
if (maxSeen > hwmMs) {
await env.KV_STATE.put(hwmKey, new Date(maxSeen).toISOString());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not advance the high-water mark past unprocessed pages

When a gateway has more than MAX_PAGES_PER_GATEWAY * LOGS_PER_PAGE new log rows, this loop stops after the page cap but still stores maxSeen from the newest fetched page. On the next poll, fresh only keeps rows with created_at > hwmMs, so every older log that was beyond the page cap is skipped permanently, undercounting cost_ledger after initial deploys, downtime, or high-volume bursts. Only advance the high-water mark once the pagination has reached the old high-water mark/end, or keep a cursor that does not discard the unprocessed tail.

Useful? React with 👍 / 👎.

const enc = new TextEncoder();
const a = enc.encode(presented);
const b = enc.encode(key);
if (a.byteLength !== b.byteLength) return false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid returning before constant-time comparison

For _admin/poll, any request with a bearer token length different from COMPTROLLER_HMAC_KEY returns before timingSafeEqual, so an attacker who can repeatedly time requests can learn the secret length even though this helper is documented as constant-time. Cloudflare’s own timingSafeEqual guidance warns not to return early on length mismatch; compare a same-length dummy/self buffer and then fail instead.

Useful? React with 👍 / 👎.

Comment on lines +295 to +298
const baseline = await env.KV_STATE.get(BASELINE_LEARNING_KEY);
if (!baseline) {
const until = new Date(Date.now() + BASELINE_LEARNING_DAYS * 24 * 3600 * 1000).toISOString();
await env.KV_STATE.put(BASELINE_LEARNING_KEY, until);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve manual baseline-learning end

When an operator uses /_admin/baseline_learning/end, it deletes this KV key, but the next scheduled run, status request, or admin poll calls ensureColdStartState() and treats the missing key as a fresh deploy by writing a new 14-day window. Since pollMetrics() suppresses L2/L3 while isBaselineLearningActive() is true, the documented manual override is immediately undone and Comptroller remains L1-only; store an initialized/disabled marker or set an expired timestamp instead of recreating after deletion.

Useful? React with 👍 / 👎.

Comment on lines +1055 to +1058
const anomalyCount = (await db`
SELECT count(*)::int AS n FROM chittyops.anomalies
WHERE detected_at >= date_trunc('day', now() AT TIME ZONE 'America/Chicago')
`) as any[];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add storage for the new anomalies query

I searched repo-wide for CREATE TABLE definitions for anomalies and only found references; the migrations create cost_ledger and pause_exemptions, but not chittyops.anomalies. On a fresh deploy following the checked-in migrations/runbook, /reports/daily now executes this unguarded query and returns 500, which also breaks the AGENTS deploy verification step for /reports/daily; add the table migration or fail-soft here like listAnomalies() does.

Useful? React with 👍 / 👎.

Comment on lines +445 to +446
await writeDb`
INSERT INTO chittyops.cost_ledger ${writeDb(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make gateway log inserts idempotent

If the 5-minute cron overlaps with /_admin/poll or a previous long-running poll, both invocations can read the same KV high-water mark before either stores the new value and then execute this insert for the same Cloudflare log IDs. The cost ledger migration has no uniqueness constraint on item_id_hash, so those races double-count calls and cost; use a per-gateway lock/cursor or a unique key with ON CONFLICT DO NOTHING before enabling concurrent manual and scheduled ingest.

Useful? React with 👍 / 👎.

Comment on lines +947 to +948
console.error("[isServiceExempt] query failed:", e);
return false; // fail-safe: do not block on lookup failure for L2/L3 gate caller

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail closed when exemption lookup fails

If the exemption query fails, this returns false, and emitL3Signal() treats a protected service as non-exempt and posts to its pause endpoint without SMS confirmation. This is likely in the checked-in setup because I found no migration granting comptroller_reader access to chittyops.pause_exemptions; the previous uncaught error would have aborted the pause path, while this catch turns permission/table outages into fail-open behavior against the AGENTS hard rule to never pause exempt services without explicit SMS confirm.

Useful? React with 👍 / 👎.

tokens_out: Math.round(l.tokens_out ?? 0),
cached_tokens_in: Math.round(l.usage_metadata?.input_cached_tokens ?? 0),
cost_usd: Number(l.cost ?? 0),
latency_ms: Math.round(l.timings?.latency ?? 0),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read latency from the gateway log duration

Cloudflare's List Gateway Logs response documents duration on each LogListResponse and does not include a timings.latency object, so every ingested row from this endpoint will store latency_ms as 0. That makes the materialized view's average and p95 latency fields unusable for dashboards/anomaly analysis even though the source API provides the value; map the documented duration field instead.

Useful? React with 👍 / 👎.

@chitcommit chitcommit enabled auto-merge (squash) June 15, 2026 05:10
@chitcommit

Copy link
Copy Markdown
Contributor Author

@claude resolve conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants