Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0**

### Changed

- **Slim distribution:** this repository omits the full in-tree **`docs/`** tree, org mirror scripts, and **`verify-repo-standards`** wrappers. Narrative docs and maintainer runbooks live on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)**; in-repo links now point there where applicable.
- **Slim distribution:** this repository ships a focused in-tree **`docs/`** tree (CLI, HTTP API, SDK, operations/policy, release artifact, web UI references); org mirror scripts and **`verify-repo-standards`** wrappers are not included. Extended maintainer runbooks and the canonical README live on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)**; in-repo links now point there where applicable.
- **`pyproject.toml`:** OpenTelemetry packages are **optional** only (**`telemetry`** / **`all`** extras); the default install matches the **1.0.0** dependency story (core does not import OpenTelemetry).
- **`.pre-commit-config.yaml`:** **ruff** replaces **black** / **isort**; **`ruff-pre-commit`** pinned to **v0.15.12** to match **`dev`** (**`ruff==0.15.12`**).
- **CI:** Python **3.13** and **3.14** added to the Ubuntu and Windows matrices (superseded by **3.14**-only policy as of **1.0.2**).
Expand Down
10 changes: 9 additions & 1 deletion ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,15 @@ This roadmap is meant to be clear from **what is already shipped** to **near-ter

## Next release

**v1.0.4** (patch): Phase 0 closing slice — **`GET /v1/metrics`** (JSON ledger counters); **`pricing.prices`** on **`POST /v1/diff`** plus CLI **Per-1k token prices** line and matching web diff detail when pricing/model changes; **[examples/README.md](examples/README.md)** end-to-end walkthrough linking **integration**, **CI**, and **deploy** examples. See **[CHANGELOG.md](CHANGELOG.md)** and **[RELEASE_NOTES.md](RELEASE_NOTES.md)**. No breaking changes to stable CLI, HTTP, or **`api_version` `v1`** contracts.
**v1.0.4 is shipped.** See **[CHANGELOG.md](CHANGELOG.md)** for the full list of additions. The
v1.0.4 slice delivered: `GET /v1/metrics` (JSON ledger counters), `pricing.prices` on
`POST /v1/diff`, CLI **Per-1k token prices** output, matching web diff banner detail, and the
**[examples/README.md](examples/README.md)** end-to-end walkthrough.

**v1.0.5 / next patch:** candidates include documentation completeness improvements (time-window
semantics, error message catalog, checksum format, tenant/task filter scope in UI), and
continued Phase 0 hardening. No breaking changes expected to stable CLI, HTTP, or
**`api_version` `v1`** contracts.

---

Expand Down
29 changes: 25 additions & 4 deletions docs/http-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,13 +105,18 @@ List all registered releases.
"agent_id": "agent_support",
"version": "1.2.0",
"environment": "production",
"checksum": "sha256:...",
"checksum": "a3f1c2e4b5d6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2",
"created_at": "2026-05-01T12:00:00+00:00"
}
]
}
```

`checksum` is a **64-character lowercase hex string** (raw SHA-256; no `sha256:` prefix). The
same value is printed with a `sha256=` label by `flightdeck release show` and
`flightdeck release verify` for human readability, but the stored and returned value is the
bare hex.

---

## `GET /v1/promoted`
Expand All @@ -137,13 +142,16 @@ List the currently promoted release for each `agent_id` / `environment` pair.

List promotion and rollback actions from the audit ledger.

Results are returned **newest first** (`ORDER BY created_at DESC`), so the most recent action
is always the first element in the array.

**Query parameters**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `agent` | string | — | Filter by `agent_id` |
| `env` | string | — | Filter by environment |
| `limit` | integer | 50 | Max records returned (1–500) |
| `limit` | integer | 50 | Max records returned (1–500); server enforces a minimum of 1 and a maximum of 500 |

**Response**
```json
Expand Down Expand Up @@ -229,7 +237,11 @@ produce an error.

**Errors**
- HTTP 400 — unsupported `api_version` value, or a field in a `RunEvent` fails type/range
validation after the per-event `api_version` check.
validation after the per-event `api_version` check. Field validation errors include
the prefix `"Invalid RunEvent: "` in the `detail` string, e.g.
`"Invalid RunEvent: 1 validation error for RunEvent …"`. Client code that parses
error messages can key off this prefix to distinguish per-event validation failures
from `api_version` rejections.
- HTTP 422 — `events` array is empty or the request body does not match the expected shape
(Pydantic validation error; returned as an array under `detail`).

Expand Down Expand Up @@ -301,9 +313,18 @@ the audit ledger.
}
```

`window` format: `{N}d` (days), `{N}h` (hours), `{N}m` (minutes). Required.
`window` format: `{N}d` (days), `{N}h` (hours), `{N}m` (minutes). Required. `N` must be
a positive integer. Seconds and weeks are not supported. Examples: `"7d"`, `"24h"`,
`"30m"`. Invalid formats return HTTP 400.

`environment` defaults to `WorkspaceConfig.default_environment` when `null`.

**Time-window semantics:** run events are queried with `timestamp >= since AND timestamp <
until`. The interval is **half-open**: `since` is inclusive, `until` is exclusive. Both
boundaries are in UTC. `until` is set to the server's clock at the moment the request is
processed; `since` is `until - window_delta`. An event exactly at `until` is **not**
included.

**Response**
```json
{
Expand Down
33 changes: 30 additions & 3 deletions docs/operations-and-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,11 @@ compute_diff(
3. Load the pricing table for each release (provider + pricing_version from
`spec.pricing_reference`). Missing tables raise `OperationError` with a hint to run
`flightdeck pricing import`.
4. Parse `window` into a `timedelta`; compute `since = now - delta`, `until = now`.
4. Parse `window` into a `timedelta` via `ledger.parse_window`. Valid units are `d`
(days), `h` (hours), and `m` (minutes) — seconds and weeks are not supported. The
numeric part must be a positive integer; `"0h"`, `"-7d"`, and `"7w"` all raise
`OperationError`. Compute `since = until - delta`, `until = now` (UTC at call time).
Events are queried with `timestamp >= since AND timestamp < until` (half-open interval).
5. Query `run_events` for each release ID filtered by environment, tenant, task, and the
time window.
6. Call `ledger.diff_releases` to compute per-side rollups (cost, latency, error rate),
Expand Down Expand Up @@ -145,6 +149,24 @@ This can happen if `run_id` values from different agents were ingested under the
`release_id`. Ensure every `RunEvent` for a release carries the correct `agent_id`
matching `spec.agent.agent_id` in the release artifact.

### Diffs where one side has no run events

`diff_releases` only runs the cross-agent agent consistency check when **both** sides
have events. If one side (or both) has zero events in the window, the consistency check is
skipped. The rollup for the empty side evaluates to zero runs, zero cost, no latency data,
and zero error rate. Confidence is determined by the sample count thresholds as normal:

- With default thresholds (`min_candidate_runs=500`, `min_baseline_runs=500`,
`min_low_runs=50`), a baseline with zero runs will produce `LOW` confidence.
- With all thresholds set to `0` (staging policy), zero events on either side can reach
`HIGH` confidence.

**Practical implication:** if you register a new baseline with no run history and
immediately diff it against a candidate, the diff will complete without error, but
`baseline_runs` will be 0 and confidence will be `LOW` (or lower than `HIGH` with default
thresholds). This is a valid signal — it means the baseline has no observable data to
compare against.

### Pricing and model change detection

`DiffOutcome` includes a `pricing_or_model_changed` flag that is `True` when any of the
Expand Down Expand Up @@ -525,13 +547,18 @@ corresponding check in `test_schemas.py` (or `test_doctor.py`).

| Error | Cause | Fix |
|-------|-------|-----|
| `Unknown baseline release: rel_...` | Release not registered | `flightdeck release register <path>` |
| `Missing pricing table for baseline openai/2024-02` | Pricing not imported | `flightdeck pricing import <path>` |
| `Unknown baseline release: rel_...` | Baseline release ID not registered | `flightdeck release register <path>` |
| `Unknown candidate release: rel_...` | Candidate release ID not registered | `flightdeck release register <path>` |
| `Missing pricing table for baseline openai/2024-02` | Pricing not imported for baseline provider/version | `flightdeck pricing import <path>` |
| `Missing pricing table for candidate openai/2024-02` | Pricing not imported for candidate provider/version | `flightdeck pricing import <path>` |
| `Missing pricing table for rollback target openai/2024-02` | Pricing not imported for promote/rollback target | `flightdeck pricing import <path>` |
| `Missing pricing table for promoted_baseline openai/2024-02` | Pricing for the currently-promoted baseline is not present | Import the missing table with `flightdeck pricing import <path>` |
| `Cross-agent diff is not allowed` | Releases belong to different agents | Use releases from the same `agent_id` |
| `Each side of the diff must have a single consistent agent_id among run events` | Ingested events for that release contain mixed `agent_id` values | Verify all `RunEvent` records use the correct `agent_id` matching the release artifact; re-ingest corrected events |
| `Pricing table missing model entry` | Pricing table does not list the model used in the release | Add the model to the pricing YAML and reimport with `--replace` |
| `Reason is required for promote/rollback actions` | Empty `--reason` flag | Provide a non-empty `--reason` |
| `No promoted release exists for this agent/environment; nothing to roll back to` | Trying to roll back with no baseline | Promote a release first |
| `Promoted baseline release is missing: rel_...` | A promoted pointer exists but the referenced release record is gone (e.g. manual DB edit) | Restore from backup; then re-register the release if the artifact is available and promote it to reset the pointer |
| `Workspace config not found: flightdeck.yaml` | Missing `flightdeck.yaml` | `flightdeck init` |

---
Expand Down
7 changes: 7 additions & 0 deletions docs/web-ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,13 @@ Form-based interface for `POST /v1/diff`. Fields mirror the request body:
| Window | `7d` | `window` |
| Environment | `local` | `environment` (sent as `null` when empty) |

`tenant_id` and `task_id` are **not exposed** in the UI form. To run a diff narrowed to a
specific tenant or task, use the CLI (`flightdeck release diff --tenant <id> --task <id>`)
or call `POST /v1/diff` directly with the `tenant_id` and `task_id` fields. See
[http-api.md § POST /v1/diff](http-api.md#post-v1diff) and
[operations-and-policy.md § compute_diff vs. promote_release filter scope](operations-and-policy.md#compute_diff-vs-promote_release--rollback_release-filter-scope)
for details on what those filters affect.

On submit, the raw diff response is parsed and rendered as:

- **Summary card:** policy badge (PASS / FAIL), failure reasons list, sample counts and
Expand Down
Loading