Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,3 +187,14 @@ virtual environment's Python executable directly:
```

Use **`uv run python -m pytest`** from the repo root so imports like **`from tests.test_spine import …`** resolve the same way as in CI.

## Environment variables

| Variable | Component | Description |
|----------|-----------|-------------|
| `FLIGHTDECK_LOCAL_API_TOKEN` | Server | When set, `POST /v1/promote` and `POST /v1/rollback` require `Authorization: Bearer <token>`. Read endpoints and `POST /v1/events` are unaffected. See [docs/http-api.md](docs/http-api.md) and [SECURITY.md](SECURITY.md). |
| `FLIGHTDECK_USE_SYSTEM_TEMP` | Tests | Set to `1` to force pytest to use the OS default temp directory instead of the repo-local `.tmp/` directory. Useful on developer machines where `%TEMP%` works correctly (see *Troubleshooting* above). |
| `USER` / `USERNAME` | CLI | Used to populate the `actor` field on promote, rollback, and pricing import audit records. `USER` is checked first (Unix/macOS), then `USERNAME` (Windows); falls back to `"unknown"`. |
| `VITE_FLIGHTDECK_LOCAL_API_TOKEN` | Web dev server | Build-time variable for the React UI dev server (Vite). Copy `web/.env.example` → `web/.env.local` to set it when testing mutations through `npm run dev` against a token-protected server. |
| `VITE_DEV_PROXY_TARGET` | Web dev server | Overrides the Vite proxy target for `/v1` (default: `http://127.0.0.1:8765`). |
| `TMPDIR` / `TEMP` / `TMP` | Tests / OS | Standard temp directory environment variables. Set any of these to a repo-local `.tmp/` path if the OS default is restricted or permissions cause pytest failures. |
16 changes: 15 additions & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,21 @@ serve` see [http-api.md](http-api.md).
| `--help` | Print help for any command or subcommand |

All commands require a `flightdeck.yaml` in the working directory (or the default path
`./flightdeck.yaml`). Run `flightdeck init` to create one.
`./flightdeck.yaml`). Run `flightdeck init` to create one. The only exception is
`flightdeck init` itself — it writes the file and does not call `load_config`.

## Actor resolution

Several commands that write to the audit ledger (`release promote`, `release rollback`,
`pricing import`) record an `actor` value. For CLI commands, `actor` is resolved from
the environment at invocation time:

1. `USER` environment variable (Unix / macOS)
2. `USERNAME` environment variable (Windows)
3. Falls back to `"unknown"` if neither is set

The HTTP API's `POST /v1/promote` and `POST /v1/rollback` accept an explicit `"actor"`
field in the request body (defaults to `"http"` when omitted).

## Exit codes

Expand Down
20 changes: 17 additions & 3 deletions docs/http-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,16 +176,29 @@ Ingest `RunEvent` records (runtime evidence for diff and policy evaluation).
}
```

`api_version` may be omitted (defaults to `"v1"`). Any other value returns HTTP 400.
`api_version` may be omitted (defaults to `"v1"`). Any other value — including `""`,
`null`, wrong case like `"V1"`, or unknown strings — returns HTTP 400 with a message of
the form `"Unsupported api_version for POST /v1/events: <value> (only 'v1' is accepted)."`.

`run_id` must be unique per workspace; duplicates are silently ignored by storage.

The `events` array must contain **at least one event**. An empty array (`"events": []`)
is rejected by Pydantic validation with HTTP **422** before any event processing occurs.

**Response**
```json
{"inserted": 1}
```

`inserted` is the count of **newly written** rows. Events with a `run_id` that already
exists in storage are silently skipped; they do not increment `inserted` and do not
produce an error.

**Errors**
- HTTP 400 — unsupported `api_version` or malformed `RunEvent` field.
- HTTP 400 — unsupported `api_version` value, or a field in a `RunEvent` fails type/range
validation after the per-event `api_version` check.
- HTTP 422 — `events` array is empty or the request body does not match the expected shape
(Pydantic validation error; returned as an array under `detail`).

Full field reference: [`schemas/v1/run_event.schema.json`](../schemas/v1/run_event.schema.json).

Expand Down Expand Up @@ -316,7 +329,8 @@ Default thresholds (from `WorkspaceConfig.diff`): `min_candidate_runs=500`,
`min_baseline_runs=500`, `min_low_runs=50`. Override per-workspace or via the active policy.

**Errors**
- HTTP 400 — unknown release ID, missing pricing table, cross-agent diff, or invalid
- HTTP 400 — unknown release ID, missing pricing table, cross-agent diff (releases have
different `agent_id`), inconsistent `agent_id` within one side's run events, or invalid
`window` format. The `detail` field describes the specific problem.

---
Expand Down
34 changes: 34 additions & 0 deletions docs/operations-and-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,12 +106,45 @@ cost = (input_tokens / 1000) * input_usd_per_1k

Runs are averaged across all events in the window to produce `cost_per_run_usd`.

### `compute_diff` vs. `promote_release` / `rollback_release`: filter scope

`compute_diff` supports optional `tenant_id` and `task_id` filters in addition to
`environment`. These allow you to narrow the evidence window to a specific tenant or task
type when comparing releases.

`_evaluate_promotion_or_rollback` (the shared path for `promote` and `rollback`) does
**not** accept tenant or task filters. It queries run events for the entire environment
over the window:

```python
# promote/rollback path — no tenant_id or task_id argument passed
storage.query_runs(release_id, since, until, environment=environment)
```

This means **policy evaluation for promote/rollback aggregates all runs in the
environment over the window**, regardless of tenant or task. The active policy applies to
the full population of events for that release, not a filtered slice. If you need
tenant-scoped evaluation, use `release diff` first to inspect the filtered evidence, then
decide whether to promote.

### Important constraint: cross-agent diffs

`compute_diff` checks that both releases have the same `agent_id` in their artifact
spec *before* querying events. This is checked again inside `diff_releases` if run events
from both sides are non-empty.

`diff_releases` also enforces that all events on a given side share a single `agent_id`.
If events for the baseline (or candidate) release span multiple agent IDs, the diff is
rejected with:

```
Each side of the diff must have a single consistent agent_id among run events.
```

This can happen if `run_id` values from different agents were ingested under the same
`release_id`. Ensure every `RunEvent` for a release carries the correct `agent_id`
matching `spec.agent.agent_id` in the release artifact.

### Rollup semantics

`ledger.compute_rollup` aggregates a list of `RunEvent` objects into a `Rollup`:
Expand Down Expand Up @@ -413,6 +446,7 @@ corresponding check in `test_schemas.py` (or `test_doctor.py`).
| `Unknown baseline release: rel_...` | Release not registered | `flightdeck release register <path>` |
| `Missing pricing table for baseline openai/2024-02` | Pricing not imported | `flightdeck pricing import <path>` |
| `Cross-agent diff is not allowed` | Releases belong to different agents | Use releases from the same `agent_id` |
| `Each side of the diff must have a single consistent agent_id among run events` | Ingested events for that release contain mixed `agent_id` values | Verify all `RunEvent` records use the correct `agent_id` matching the release artifact; re-ingest corrected events |
| `Pricing table missing model entry` | Pricing table does not list the model used in the release | Add the model to the pricing YAML and reimport with `--replace` |
| `Reason is required for promote/rollback actions` | Empty `--reason` flag | Provide a non-empty `--reason` |
| `No promoted release exists for this agent/environment; nothing to roll back to` | Trying to roll back with no baseline | Promote a release first |
Expand Down
Loading