From 2047b637b2410b88c076ae0ecf6973c7af50ad5b Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sat, 2 May 2026 12:02:05 +0000 Subject: [PATCH] docs: document validation edge cases, promote/rollback scope, actor resolution, env vars MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - docs/http-api.md: clarify POST /v1/events validation rules — empty events array returns HTTP 422 (Pydantic); api_version values other than 'v1' (empty string, null, wrong case, unknown strings) return HTTP 400 with a specific message; document that 'inserted' counts only newly written rows; add 422 to Errors table; note inconsistent agent_id within one side's events as a 400 source on POST /v1/diff. - docs/operations-and-policy.md: add 'compute_diff vs. promote/rollback: filter scope' subsection explaining that promote/rollback query events by environment only (no tenant_id or task_id filter), whereas compute_diff supports all three; expand 'cross-agent diffs' section to document the mixed-agent_id-within-events error and its cause; add the new error to the common errors table. - docs/cli.md: document that flightdeck init does not require a pre-existing flightdeck.yaml (it is the exception to the 'all commands require config' rule); add 'Actor resolution' section documenting USER / USERNAME / 'unknown' fallback for CLI audit records and how the HTTP API actor field differs. - DEVELOPMENT.md: add 'Environment variables' reference table covering FLIGHTDECK_LOCAL_API_TOKEN, FLIGHTDECK_USE_SYSTEM_TEMP, USER/USERNAME, VITE_FLIGHTDECK_LOCAL_API_TOKEN, VITE_DEV_PROXY_TARGET, and TMPDIR/TEMP/TMP. Co-authored-by: Gottam Sai Bharath --- DEVELOPMENT.md | 11 +++++++++++ docs/cli.md | 16 +++++++++++++++- docs/http-api.md | 20 +++++++++++++++++--- docs/operations-and-policy.md | 34 ++++++++++++++++++++++++++++++++++ 4 files changed, 77 insertions(+), 4 deletions(-) diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index aaa1a7f..2b1134d 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -187,3 +187,14 @@ virtual environment's Python executable directly: ``` Use **`uv run python -m pytest`** from the repo root so imports like **`from tests.test_spine import …`** resolve the same way as in CI. + +## Environment variables + +| Variable | Component | Description | +|----------|-----------|-------------| +| `FLIGHTDECK_LOCAL_API_TOKEN` | Server | When set, `POST /v1/promote` and `POST /v1/rollback` require `Authorization: Bearer `. Read endpoints and `POST /v1/events` are unaffected. See [docs/http-api.md](docs/http-api.md) and [SECURITY.md](SECURITY.md). | +| `FLIGHTDECK_USE_SYSTEM_TEMP` | Tests | Set to `1` to force pytest to use the OS default temp directory instead of the repo-local `.tmp/` directory. Useful on developer machines where `%TEMP%` works correctly (see *Troubleshooting* above). | +| `USER` / `USERNAME` | CLI | Used to populate the `actor` field on promote, rollback, and pricing import audit records. `USER` is checked first (Unix/macOS), then `USERNAME` (Windows); falls back to `"unknown"`. | +| `VITE_FLIGHTDECK_LOCAL_API_TOKEN` | Web dev server | Build-time variable for the React UI dev server (Vite). Copy `web/.env.example` → `web/.env.local` to set it when testing mutations through `npm run dev` against a token-protected server. | +| `VITE_DEV_PROXY_TARGET` | Web dev server | Overrides the Vite proxy target for `/v1` (default: `http://127.0.0.1:8765`). | +| `TMPDIR` / `TEMP` / `TMP` | Tests / OS | Standard temp directory environment variables. Set any of these to a repo-local `.tmp/` path if the OS default is restricted or permissions cause pytest failures. | diff --git a/docs/cli.md b/docs/cli.md index 5ea1619..0d99ab1 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -16,7 +16,21 @@ serve` see [http-api.md](http-api.md). | `--help` | Print help for any command or subcommand | All commands require a `flightdeck.yaml` in the working directory (or the default path -`./flightdeck.yaml`). Run `flightdeck init` to create one. +`./flightdeck.yaml`). Run `flightdeck init` to create one. The only exception is +`flightdeck init` itself — it writes the file and does not call `load_config`. + +## Actor resolution + +Several commands that write to the audit ledger (`release promote`, `release rollback`, +`pricing import`) record an `actor` value. For CLI commands, `actor` is resolved from +the environment at invocation time: + +1. `USER` environment variable (Unix / macOS) +2. `USERNAME` environment variable (Windows) +3. Falls back to `"unknown"` if neither is set + +The HTTP API's `POST /v1/promote` and `POST /v1/rollback` accept an explicit `"actor"` +field in the request body (defaults to `"http"` when omitted). ## Exit codes diff --git a/docs/http-api.md b/docs/http-api.md index 232cbd8..6f340aa 100644 --- a/docs/http-api.md +++ b/docs/http-api.md @@ -176,16 +176,29 @@ Ingest `RunEvent` records (runtime evidence for diff and policy evaluation). } ``` -`api_version` may be omitted (defaults to `"v1"`). Any other value returns HTTP 400. +`api_version` may be omitted (defaults to `"v1"`). Any other value — including `""`, +`null`, wrong case like `"V1"`, or unknown strings — returns HTTP 400 with a message of +the form `"Unsupported api_version for POST /v1/events: (only 'v1' is accepted)."`. + `run_id` must be unique per workspace; duplicates are silently ignored by storage. +The `events` array must contain **at least one event**. An empty array (`"events": []`) +is rejected by Pydantic validation with HTTP **422** before any event processing occurs. + **Response** ```json {"inserted": 1} ``` +`inserted` is the count of **newly written** rows. Events with a `run_id` that already +exists in storage are silently skipped; they do not increment `inserted` and do not +produce an error. + **Errors** -- HTTP 400 — unsupported `api_version` or malformed `RunEvent` field. +- HTTP 400 — unsupported `api_version` value, or a field in a `RunEvent` fails type/range + validation after the per-event `api_version` check. +- HTTP 422 — `events` array is empty or the request body does not match the expected shape + (Pydantic validation error; returned as an array under `detail`). Full field reference: [`schemas/v1/run_event.schema.json`](../schemas/v1/run_event.schema.json). @@ -316,7 +329,8 @@ Default thresholds (from `WorkspaceConfig.diff`): `min_candidate_runs=500`, `min_baseline_runs=500`, `min_low_runs=50`. Override per-workspace or via the active policy. **Errors** -- HTTP 400 — unknown release ID, missing pricing table, cross-agent diff, or invalid +- HTTP 400 — unknown release ID, missing pricing table, cross-agent diff (releases have + different `agent_id`), inconsistent `agent_id` within one side's run events, or invalid `window` format. The `detail` field describes the specific problem. --- diff --git a/docs/operations-and-policy.md b/docs/operations-and-policy.md index eefc9d3..ebc5fbe 100644 --- a/docs/operations-and-policy.md +++ b/docs/operations-and-policy.md @@ -106,12 +106,45 @@ cost = (input_tokens / 1000) * input_usd_per_1k Runs are averaged across all events in the window to produce `cost_per_run_usd`. +### `compute_diff` vs. `promote_release` / `rollback_release`: filter scope + +`compute_diff` supports optional `tenant_id` and `task_id` filters in addition to +`environment`. These allow you to narrow the evidence window to a specific tenant or task +type when comparing releases. + +`_evaluate_promotion_or_rollback` (the shared path for `promote` and `rollback`) does +**not** accept tenant or task filters. It queries run events for the entire environment +over the window: + +```python +# promote/rollback path — no tenant_id or task_id argument passed +storage.query_runs(release_id, since, until, environment=environment) +``` + +This means **policy evaluation for promote/rollback aggregates all runs in the +environment over the window**, regardless of tenant or task. The active policy applies to +the full population of events for that release, not a filtered slice. If you need +tenant-scoped evaluation, use `release diff` first to inspect the filtered evidence, then +decide whether to promote. + ### Important constraint: cross-agent diffs `compute_diff` checks that both releases have the same `agent_id` in their artifact spec *before* querying events. This is checked again inside `diff_releases` if run events from both sides are non-empty. +`diff_releases` also enforces that all events on a given side share a single `agent_id`. +If events for the baseline (or candidate) release span multiple agent IDs, the diff is +rejected with: + +``` +Each side of the diff must have a single consistent agent_id among run events. +``` + +This can happen if `run_id` values from different agents were ingested under the same +`release_id`. Ensure every `RunEvent` for a release carries the correct `agent_id` +matching `spec.agent.agent_id` in the release artifact. + ### Rollup semantics `ledger.compute_rollup` aggregates a list of `RunEvent` objects into a `Rollup`: @@ -413,6 +446,7 @@ corresponding check in `test_schemas.py` (or `test_doctor.py`). | `Unknown baseline release: rel_...` | Release not registered | `flightdeck release register ` | | `Missing pricing table for baseline openai/2024-02` | Pricing not imported | `flightdeck pricing import ` | | `Cross-agent diff is not allowed` | Releases belong to different agents | Use releases from the same `agent_id` | +| `Each side of the diff must have a single consistent agent_id among run events` | Ingested events for that release contain mixed `agent_id` values | Verify all `RunEvent` records use the correct `agent_id` matching the release artifact; re-ingest corrected events | | `Pricing table missing model entry` | Pricing table does not list the model used in the release | Add the model to the pricing YAML and reimport with `--replace` | | `Reason is required for promote/rollback actions` | Empty `--reason` flag | Provide a non-empty `--reason` | | `No promoted release exists for this agent/environment; nothing to roll back to` | Trying to roll back with no baseline | Promote a release first |