Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ Workspaces: `packages/*`, `services/*`, `apps/*`, `infra/*`, `tests/*`, `scripts

- better-auth + GitHub OAuth. Middleware enriches Hono context with the user; frontend `AuthProvider` (`apps/web/src/features/auth/context.tsx`) mirrors session state and sends `credentials: 'include'`.
- Rate limits (submission + poll tiers) defined in `services/api-gateway/src/middleware/rate-limit.ts`; Redis-backed via `hono-rate-limiter`.
- Per-plan quotas in `packages/shared/src/plan.ts`.
- `plan` is the scheduling **class** (queue priority + SLO bucket); numeric **quota** is resolved separately. `PlanResolver.resolve` returns `ResolvedAccount = { plan, limits }` where `limits` is `EffectiveLimits` (`submissionsPerMinute`, `maxConcurrentJobs`, `maxSequenceLength`, `sloSeconds`) — class defaults merged with a per-account override from the `user.limits jsonb` column (`mergeLimits` + `OverrideLimitsSchema` in `packages/shared/src/plan.ts`; absent/invalid → class default). Consumers read `auth.limits.*` off the auth context, never `PLAN_LIMITS[plan]` at the call site.
- Overrides are admin-only: `PUT/DELETE /admin/accounts/{userId}/limits` (`services/api-gateway/src/admin/limits.ts`); never user-editable. PUT is full-replace; DELETE clears to class defaults.

## Errors

Expand Down
5 changes: 5 additions & 0 deletions infra/postgres/seeds/dev-users-plan.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
-- Seed dev-only user plan overrides. Safe to re-run.
UPDATE "user" SET plan = 'pro' WHERE email = 'dev-pro@example.com';
UPDATE "user" SET plan = 'free' WHERE email = 'dev-free@example.com';

-- Example per-account limit override: a pro account with raised quotas.
UPDATE "user"
SET limits = '{"submissionsPerMinute": 120, "maxConcurrentJobs": 25}'::jsonb
WHERE email = 'dev-pro@example.com';
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-06-18
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
## Context

`DbPlanResolver.resolve(userId)` returns a bare `Plan` enum read from `SELECT plan FROM "user"`. Every quota decision then does its own `PLAN_LIMITS[plan]` lookup:

- `middleware/rate-limit.ts` → `submissionsPerMinute`
- `routes/_utils.ts` (`withinConcurrentJobLimit`) → `maxConcurrentJobs`
- submission validation → `MAX_SEQUENCE_LENGTH` (a single global constant, not yet per-plan)
- shedding → `sloSeconds[plan]` from `loadSheddingConfig`

Because the only DB-derived input is the class name, all accounts on a class share identical quotas. The `fix/shedding-residue-leak` branch added `RATE_LIMIT_SUBMISSIONS_*` env config, but those are global-per-class defaults (load-test driven) — they cannot express two enterprise customers with different agreements. This design introduces per-account overrides resolved at the existing resolver seam.

## Goals / Non-Goals

**Goals:**

- A single resolution seam returns merged `EffectiveLimits`; call sites stop indexing `PLAN_LIMITS` directly for quota.
- Per-account overrides stored in `user.limits jsonb`, sparse and partial.
- Override envelope: `submissionsPerMinute`, `maxConcurrentJobs`, `maxSequenceLength`, `sloSeconds`.
- Admin-only management surface; never user-editable.
- Plan enum preserved for priority + SLO class.

**Non-Goals:**

- Enterprise self-serve UI or billing/entitlement-provider integration.
- Removing or replacing the `RATE_LIMIT_SUBMISSIONS_*` env knobs (they remain the per-class defaults).
- Per-account _priority_ or _class_ overrides — scheduling stays class-based.

## Decisions

### 1. Resolver returns `EffectiveLimits`, not `Plan`

`PlanResolver.resolve` becomes `resolve(userId): Promise<ResolvedAccount>` where `ResolvedAccount = { plan: Plan; limits: EffectiveLimits }`. `limits` is computed as `mergeLimits(PLAN_LIMITS[plan]+slo, override)`. Call sites read `auth.limits.*`; scheduling reads `auth.plan`.

_Alternative considered:_ keep returning `Plan` and add a parallel `resolveLimits`. Rejected — two lookups invite drift and a second DB round-trip; one resolved object keeps the merge in one place.

### 2. Carry resolved limits on the auth context

The auth context already exposes `{ sub, plan }`. Extend it to `{ sub, plan, limits }` so middleware and route utils read the pre-merged object without re-querying. The DB read already happens during plan resolution; the override travels in the same row (`SELECT plan, limits FROM "user"`), so no extra query.

### 3. jsonb sparse override, validated with Zod

`limits jsonb` (nullable). Stored value is a partial object; a Zod schema (`OverrideLimitsSchema`) validates positive integers per field and `.strict()` rejects unknown keys. jsonb chosen for flexibility — the agreement-shaped envelope is expected to grow and is rarely touched, so typed columns + migrations per field would be churn for little gain. The merge is `{ ...defaults, ...parsed }`; a parse failure is logged and treated as no override (fail-safe to class defaults).

_Alternative considered:_ typed columns. Rejected for now — premature rigidity; revisit if the envelope stabilizes or needs DB-level constraints.

### 4. `maxSequenceLength` becomes plan-aware

Today `MAX_SEQUENCE_LENGTH` is one global constant. The default per-class value seeds from that constant for all classes (no behavior change by default); the override can raise it per account. Submission validation reads `auth.limits.maxSequenceLength`.

### 5. SLO seconds resolved through the same object

`loadSheddingConfig().sloSeconds` provides the class default. The resolver overlays a per-account `sloSeconds` when present. The shedding decision reads the account's resolved value rather than indexing the global config by plan. Default behavior unchanged when no override is set.

### 6. Admin route mirrors flag overrides

`PUT /admin/accounts/{userId}/limits` (validate + persist) and `DELETE` (clear → NULL), guarded by the same admin authorization as `PUT /admin/flags/<name>`. No GET on a user-facing route; an admin GET is optional and in scope for the admin surface only.

**PUT is full-replace**: the request body becomes the entire stored override object (any field not present is dropped, not merged). `DELETE` sets the column NULL to revert to class defaults. Chosen over PATCH/merge for predictability — there is one obvious way to read or reset an account's override, and "clear one field" needs no null-sentinel convention. PATCH deferred unless an operator workflow demands it.

## Risks / Trade-offs

- **Auth-context shape change ripples to every consumer** → Centralize the type in one place (`Variables`/auth type) and let the typechecker enumerate call sites; the compiler turns the ripple into a checklist.
- **jsonb lets malformed data reach the row out-of-band** (manual SQL, bad seed) → Resolver parses defensively and falls back to class defaults on any parse failure; never throws into the request path.
- **Override silently widens limits beyond infra capacity** (e.g. someone sets `submissionsPerMinute: 10^6`) → Validation caps with sane `.max()` bounds; shedding remains the backstop since it operates on observed throughput, not declared limits.
- **Drift between class default sources** (`PLAN_LIMITS`, `RATE_LIMIT_SUBMISSIONS_*` env, SLO config) → Resolver composes the class default from the same sources the gateway already loads at boot; no new default table.
- **Stale resolved limits within a request** → Limits resolve per request at auth time (same cadence as plan today); an override change takes effect on the next request, consistent with current plan-change behavior.

## Migration Plan

1. Add nullable `user.limits jsonb` column (additive migration; no backfill — NULL = class defaults).
2. Ship resolver + auth-context change; all defaults equal current behavior, so deploying with zero overrides is a no-op.
3. Add admin route last; until it exists, overrides can only be set via seed/SQL (used for the dev seed example).
4. **Rollback:** revert code (resolver falls back to class defaults regardless of column); the nullable column can remain unused with no effect.

## Open Questions

- Do we want an audit-log line on override set/clear? Likely yes (reuse the submission/admin logging idiom), but not blocking.
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## Why

Today `DbPlanResolver` reads only the `plan` enum name from Postgres, and every downstream consumer does a static `PLAN_LIMITS[plan]` lookup. As a result "enterprise" is just a bigger static tier — a generous pro — and two enterprise customers with different contractual limits are impossible to express. The original intent was that enterprise limits follow the per-customer agreement. This change wires that intent without turning every quota into a deployment-global env knob (which is what the load-test-motivated `RATE_LIMIT_SUBMISSIONS_*` config does, and which cannot differentiate two accounts on the same plan).

## What Changes

- Split the meaning carried by the `plan` enum into two concerns:
- **Class** — queue priority and SLO bucket. Stays an enum (`free` | `pro` | `enterprise`); a small ordered set is correct for scheduling.
- **Quota** — numeric entitlements (`submissionsPerMinute`, `maxConcurrentJobs`, `maxSequenceLength`, SLO seconds). Becomes per-account overridable.
- Add a `limits jsonb` column to the Postgres `user` table holding a sparse, partial override object. Absent column / absent field = fall back to the plan default.
- Change the plan-resolution seam: the resolver returns **EffectiveLimits** (`account.overrides ?? PLAN_LIMITS[class]`, merged field-by-field) instead of a bare `Plan` enum. The enum is still exposed for priority + SLO-class selection.
- Update consumers to read resolved limits: `middleware/rate-limit.ts` (submissions/min), `routes/_utils.ts` (concurrency), submission validation (`MAX_SEQUENCE_LENGTH`), and shedding SLO seconds.
- Add an **admin-only** endpoint to set/clear an account's override (mirrors the existing `PUT /admin/flags/<name>` pattern). No enterprise self-serve — overrides are never user-editable.

## Capabilities

### New Capabilities

- `plan-limits`: Resolution of an account's effective quota limits — merging per-account DB overrides over plan-class defaults — and the admin surface for managing those overrides. Establishes the class-vs-quota boundary that scheduling (priority/SLO) and quota enforcement (rate limit/concurrency/sequence length) both consume.

### Modified Capabilities

<!-- request-shedding lives only in the unarchived fix-shedding-residue-leak change delta; no published spec to amend. SLO-seconds resolution is captured as a requirement in the new plan-limits capability. -->

## Impact

- **DB**: new `user.limits jsonb` column (nullable) + migration; dev seed (`infra/postgres/seeds/dev-users-plan.sql`) gains an example override.
- **Code**: `packages/shared/src/plan.ts` (`PlanResolver` contract → `EffectiveLimits`, override-merge helper, Zod override schema), `services/api-gateway/src/plan/db-resolver.ts`, `middleware/rate-limit.ts`, `routes/_utils.ts`, shedding SLO lookup, submission length validation, new `admin/limits.ts` route.
- **API**: new admin route `PUT/DELETE /admin/accounts/{userId}/limits`; no change to public submission contracts (behavior differs only by resolved numbers).
- **Auth/PII**: override values are non-PII operational numbers; admin route reuses existing admin authorization.
- **Out of scope**: enterprise self-serve UI, billing integration, the `RATE_LIMIT_SUBMISSIONS_*` env knobs (they remain as global-per-class defaults feeding `PLAN_LIMITS`).
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
## ADDED Requirements

### Requirement: Plan class versus quota separation

The system SHALL treat an account's `plan` as a scheduling **class** (queue priority and SLO bucket) and SHALL resolve numeric **quota** limits separately, so that two accounts on the same class MAY enforce different quotas.

The quota envelope SHALL consist of: `submissionsPerMinute`, `maxConcurrentJobs`, `maxSequenceLength`, and `sloSeconds`.

#### Scenario: Class still drives scheduling

- **WHEN** an account's queue priority or SLO bucket is selected
- **THEN** the system uses the `plan` enum (`free` | `pro` | `enterprise`) and ignores any per-account quota override

#### Scenario: Quota resolved independently of class

- **WHEN** a quota limit is enforced for an account
- **THEN** the system uses the resolved effective limit, not a direct `PLAN_LIMITS[plan]` lookup at the call site

### Requirement: Effective limit resolution

The plan resolver SHALL return an `EffectiveLimits` object computed by merging a per-account override over the plan-class default on a field-by-field basis. An absent override object, an absent field, or a resolution failure SHALL fall back to the plan-class default for that field.

#### Scenario: No override present

- **WHEN** an account has no `limits` override stored
- **THEN** every effective limit equals the corresponding `PLAN_LIMITS[plan]` (and SLO default) value

#### Scenario: Partial override merges field-by-field

- **WHEN** an account's override sets only `maxConcurrentJobs`
- **THEN** `maxConcurrentJobs` uses the override value and all other limits fall back to the plan-class default

#### Scenario: Resolution failure is safe

- **WHEN** the override store read fails or returns an unparseable value
- **THEN** the resolver logs a warning and returns the plan-class defaults without throwing

### Requirement: Override storage and validation

Per-account overrides SHALL be stored in a nullable `limits jsonb` column on the Postgres `user` table as a sparse partial object. The system SHALL validate override values against a schema before applying them: each present field MUST be a positive integer (or zero where the underlying default permits zero, e.g. an SLO bucket), and unknown fields SHALL be rejected.

#### Scenario: Valid sparse override accepted

- **WHEN** an override `{ "submissionsPerMinute": 1000 }` is validated
- **THEN** validation passes and only `submissionsPerMinute` is overridden

#### Scenario: Invalid override rejected

- **WHEN** an override contains a negative number, a non-integer, or an unknown field
- **THEN** validation fails and the override is not persisted

### Requirement: Admin-only override management

The system SHALL expose admin-only endpoints to set and clear an account's limit override, mirroring the existing admin flag-override authorization. Overrides SHALL NOT be settable or viewable through any user-facing (non-admin) route.

#### Scenario: Admin sets an override

- **WHEN** an authenticated admin sends a valid override for an account via the admin route
- **THEN** the override is persisted and subsequent submissions for that account enforce the merged effective limits

#### Scenario: Admin clears an override

- **WHEN** an admin clears an account's override
- **THEN** the `limits` column is reset and the account reverts to plan-class defaults

#### Scenario: Non-admin cannot set an override

- **WHEN** a non-admin caller attempts to set or read an account override
- **THEN** the request is rejected by admin authorization
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
## 1. Shared resolution layer (`packages/shared`)

- [x] 1.1 Define `EffectiveLimits` type (`submissionsPerMinute`, `maxConcurrentJobs`, `maxSequenceLength`, `sloSeconds`) and `ResolvedAccount = { plan: Plan; limits: EffectiveLimits }` in `plan.ts`
- [x] 1.2 Add `OverrideLimitsSchema` (Zod, `.strict()`, positive-int fields with sane `.max()` caps) and an exported `LimitsOverride` type
- [x] 1.3 Add `mergeLimits(classDefaults, override)` field-by-field merge helper; absent/invalid → class default
- [x] 1.4 Change `PlanResolver.resolve` contract to return `ResolvedAccount`; export `MAX_SEQUENCE_LENGTH` as the per-class default seed
- [x] 1.5 Unit-test merge + schema (no override, partial override, invalid/unknown-field rejection)

## 2. DB resolver + auth context (`services/api-gateway`)

- [x] 2.1 Extend the user query to `SELECT plan, limits FROM "user"` and parse `limits` defensively in `DbPlanResolver`
- [x] 2.2 Resolver composes class defaults from existing config sources (`PLAN_LIMITS`, `RATE_LIMIT_SUBMISSIONS_*`, shedding `sloSeconds`) and merges the override; parse failure logs + falls back
- [x] 2.3 Extend the auth/`Variables` context type from `{ sub, plan }` to `{ sub, plan, limits }`; populate at resolution
- [x] 2.4 Update `DbPlanResolver` unit tests (no override, partial override, read failure → defaults)

## 3. Quota consumers read resolved limits

- [x] 3.1 `middleware/rate-limit.ts`: read `auth.limits.submissionsPerMinute` instead of `PLAN_LIMITS[plan]`
- [x] 3.2 `routes/_utils.ts` (`withinConcurrentJobLimit`): read `auth.limits.maxConcurrentJobs`
- [x] 3.3 Submission validation: enforce `auth.limits.maxSequenceLength` per request
- [x] 3.4 Shedding decision: use the account's resolved `sloSeconds` rather than indexing global config by plan
- [x] 3.5 Update affected unit tests; confirm defaults reproduce current behavior with no override

## 4. Persistence

- [x] 4.1 Add additive migration: nullable `user.limits jsonb` column (no backfill)
- [x] 4.2 Add an example override to `infra/postgres/seeds/dev-users-plan.sql`

## 5. Admin management surface

- [x] 5.1 Add `admin/limits.ts` route: `PUT /admin/accounts/{userId}/limits` (validate via `OverrideLimitsSchema`, persist) guarded by existing admin authorization
- [x] 5.2 Add `DELETE /admin/accounts/{userId}/limits` (set column NULL → revert to class defaults)
- [x] 5.3 Emit an audit log line on set/clear (reuse admin/submission logging idiom)
- [x] 5.4 Add route tests: valid set, invalid rejected, clear reverts, non-admin rejected

## 6. Verification

- [x] 6.1 `bun run typecheck && bun run lint && bun run test` green
- [x] 6.2 Backend E2E (`bun run test:int`) covering an account with an override enforcing different limits than its plan default
- [x] 6.3 Update CLAUDE.md plan/auth notes if the auth-context shape change warrants it
Loading
Loading