diff --git a/CHANGELOG.md b/CHANGELOG.md index 841cfb3..9ff8524 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -63,7 +63,7 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0** ### Changed -- **Slim distribution:** this repository omits the full in-tree **`docs/`** tree, org mirror scripts, and **`verify-repo-standards`** wrappers. Narrative docs and maintainer runbooks live on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)**; in-repo links now point there where applicable. +- **Slim distribution:** this repository ships a focused in-tree **`docs/`** tree (CLI, HTTP API, SDK, operations/policy, release artifact, web UI references); org mirror scripts and **`verify-repo-standards`** wrappers are not included. Extended maintainer runbooks and the canonical README live on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)**; in-repo links now point there where applicable. - **`pyproject.toml`:** OpenTelemetry packages are **optional** only (**`telemetry`** / **`all`** extras); the default install matches the **1.0.0** dependency story (core does not import OpenTelemetry). - **`.pre-commit-config.yaml`:** **ruff** replaces **black** / **isort**; **`ruff-pre-commit`** pinned to **v0.15.12** to match **`dev`** (**`ruff==0.15.12`**). - **CI:** Python **3.13** and **3.14** added to the Ubuntu and Windows matrices (superseded by **3.14**-only policy as of **1.0.2**). diff --git a/ROADMAP.md b/ROADMAP.md index 69821ab..f53ed46 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -21,7 +21,15 @@ This roadmap is meant to be clear from **what is already shipped** to **near-ter ## Next release -**v1.0.4** (patch): Phase 0 closing slice — **`GET /v1/metrics`** (JSON ledger counters); **`pricing.prices`** on **`POST /v1/diff`** plus CLI **Per-1k token prices** line and matching web diff detail when pricing/model changes; **[examples/README.md](examples/README.md)** end-to-end walkthrough linking **integration**, **CI**, and **deploy** examples. See **[CHANGELOG.md](CHANGELOG.md)** and **[RELEASE_NOTES.md](RELEASE_NOTES.md)**. No breaking changes to stable CLI, HTTP, or **`api_version` `v1`** contracts. +**v1.0.4 is shipped.** See **[CHANGELOG.md](CHANGELOG.md)** for the full list of additions. The +v1.0.4 slice delivered: `GET /v1/metrics` (JSON ledger counters), `pricing.prices` on +`POST /v1/diff`, CLI **Per-1k token prices** output, matching web diff banner detail, and the +**[examples/README.md](examples/README.md)** end-to-end walkthrough. + +**v1.0.5 / next patch:** candidates include documentation completeness improvements (time-window +semantics, error message catalog, checksum format, tenant/task filter scope in UI), and +continued Phase 0 hardening. No breaking changes expected to stable CLI, HTTP, or +**`api_version` `v1`** contracts. --- diff --git a/docs/http-api.md b/docs/http-api.md index c05b438..5f45bad 100644 --- a/docs/http-api.md +++ b/docs/http-api.md @@ -105,13 +105,18 @@ List all registered releases. "agent_id": "agent_support", "version": "1.2.0", "environment": "production", - "checksum": "sha256:...", + "checksum": "a3f1c2e4b5d6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2", "created_at": "2026-05-01T12:00:00+00:00" } ] } ``` +`checksum` is a **64-character lowercase hex string** (raw SHA-256; no `sha256:` prefix). The +same value is printed with a `sha256=` label by `flightdeck release show` and +`flightdeck release verify` for human readability, but the stored and returned value is the +bare hex. + --- ## `GET /v1/promoted` @@ -137,13 +142,16 @@ List the currently promoted release for each `agent_id` / `environment` pair. List promotion and rollback actions from the audit ledger. +Results are returned **newest first** (`ORDER BY created_at DESC`), so the most recent action +is always the first element in the array. + **Query parameters** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `agent` | string | — | Filter by `agent_id` | | `env` | string | — | Filter by environment | -| `limit` | integer | 50 | Max records returned (1–500) | +| `limit` | integer | 50 | Max records returned (1–500); server enforces a minimum of 1 and a maximum of 500 | **Response** ```json @@ -229,7 +237,11 @@ produce an error. **Errors** - HTTP 400 — unsupported `api_version` value, or a field in a `RunEvent` fails type/range - validation after the per-event `api_version` check. + validation after the per-event `api_version` check. Field validation errors include + the prefix `"Invalid RunEvent: "` in the `detail` string, e.g. + `"Invalid RunEvent: 1 validation error for RunEvent …"`. Client code that parses + error messages can key off this prefix to distinguish per-event validation failures + from `api_version` rejections. - HTTP 422 — `events` array is empty or the request body does not match the expected shape (Pydantic validation error; returned as an array under `detail`). @@ -301,9 +313,18 @@ the audit ledger. } ``` -`window` format: `{N}d` (days), `{N}h` (hours), `{N}m` (minutes). Required. +`window` format: `{N}d` (days), `{N}h` (hours), `{N}m` (minutes). Required. `N` must be +a positive integer. Seconds and weeks are not supported. Examples: `"7d"`, `"24h"`, +`"30m"`. Invalid formats return HTTP 400. + `environment` defaults to `WorkspaceConfig.default_environment` when `null`. +**Time-window semantics:** run events are queried with `timestamp >= since AND timestamp < +until`. The interval is **half-open**: `since` is inclusive, `until` is exclusive. Both +boundaries are in UTC. `until` is set to the server's clock at the moment the request is +processed; `since` is `until - window_delta`. An event exactly at `until` is **not** +included. + **Response** ```json { diff --git a/docs/operations-and-policy.md b/docs/operations-and-policy.md index eda5583..ba93ab6 100644 --- a/docs/operations-and-policy.md +++ b/docs/operations-and-policy.md @@ -86,7 +86,11 @@ compute_diff( 3. Load the pricing table for each release (provider + pricing_version from `spec.pricing_reference`). Missing tables raise `OperationError` with a hint to run `flightdeck pricing import`. -4. Parse `window` into a `timedelta`; compute `since = now - delta`, `until = now`. +4. Parse `window` into a `timedelta` via `ledger.parse_window`. Valid units are `d` + (days), `h` (hours), and `m` (minutes) — seconds and weeks are not supported. The + numeric part must be a positive integer; `"0h"`, `"-7d"`, and `"7w"` all raise + `OperationError`. Compute `since = until - delta`, `until = now` (UTC at call time). + Events are queried with `timestamp >= since AND timestamp < until` (half-open interval). 5. Query `run_events` for each release ID filtered by environment, tenant, task, and the time window. 6. Call `ledger.diff_releases` to compute per-side rollups (cost, latency, error rate), @@ -145,6 +149,24 @@ This can happen if `run_id` values from different agents were ingested under the `release_id`. Ensure every `RunEvent` for a release carries the correct `agent_id` matching `spec.agent.agent_id` in the release artifact. +### Diffs where one side has no run events + +`diff_releases` only runs the cross-agent agent consistency check when **both** sides +have events. If one side (or both) has zero events in the window, the consistency check is +skipped. The rollup for the empty side evaluates to zero runs, zero cost, no latency data, +and zero error rate. Confidence is determined by the sample count thresholds as normal: + +- With default thresholds (`min_candidate_runs=500`, `min_baseline_runs=500`, + `min_low_runs=50`), a baseline with zero runs will produce `LOW` confidence. +- With all thresholds set to `0` (staging policy), zero events on either side can reach + `HIGH` confidence. + +**Practical implication:** if you register a new baseline with no run history and +immediately diff it against a candidate, the diff will complete without error, but +`baseline_runs` will be 0 and confidence will be `LOW` (or lower than `HIGH` with default +thresholds). This is a valid signal — it means the baseline has no observable data to +compare against. + ### Pricing and model change detection `DiffOutcome` includes a `pricing_or_model_changed` flag that is `True` when any of the @@ -525,13 +547,18 @@ corresponding check in `test_schemas.py` (or `test_doctor.py`). | Error | Cause | Fix | |-------|-------|-----| -| `Unknown baseline release: rel_...` | Release not registered | `flightdeck release register ` | -| `Missing pricing table for baseline openai/2024-02` | Pricing not imported | `flightdeck pricing import ` | +| `Unknown baseline release: rel_...` | Baseline release ID not registered | `flightdeck release register ` | +| `Unknown candidate release: rel_...` | Candidate release ID not registered | `flightdeck release register ` | +| `Missing pricing table for baseline openai/2024-02` | Pricing not imported for baseline provider/version | `flightdeck pricing import ` | +| `Missing pricing table for candidate openai/2024-02` | Pricing not imported for candidate provider/version | `flightdeck pricing import ` | +| `Missing pricing table for rollback target openai/2024-02` | Pricing not imported for promote/rollback target | `flightdeck pricing import ` | +| `Missing pricing table for promoted_baseline openai/2024-02` | Pricing for the currently-promoted baseline is not present | Import the missing table with `flightdeck pricing import ` | | `Cross-agent diff is not allowed` | Releases belong to different agents | Use releases from the same `agent_id` | | `Each side of the diff must have a single consistent agent_id among run events` | Ingested events for that release contain mixed `agent_id` values | Verify all `RunEvent` records use the correct `agent_id` matching the release artifact; re-ingest corrected events | | `Pricing table missing model entry` | Pricing table does not list the model used in the release | Add the model to the pricing YAML and reimport with `--replace` | | `Reason is required for promote/rollback actions` | Empty `--reason` flag | Provide a non-empty `--reason` | | `No promoted release exists for this agent/environment; nothing to roll back to` | Trying to roll back with no baseline | Promote a release first | +| `Promoted baseline release is missing: rel_...` | A promoted pointer exists but the referenced release record is gone (e.g. manual DB edit) | Restore from backup; then re-register the release if the artifact is available and promote it to reset the pointer | | `Workspace config not found: flightdeck.yaml` | Missing `flightdeck.yaml` | `flightdeck init` | --- diff --git a/docs/web-ui.md b/docs/web-ui.md index bfbfdb0..1afb9b5 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -149,6 +149,13 @@ Form-based interface for `POST /v1/diff`. Fields mirror the request body: | Window | `7d` | `window` | | Environment | `local` | `environment` (sent as `null` when empty) | +`tenant_id` and `task_id` are **not exposed** in the UI form. To run a diff narrowed to a +specific tenant or task, use the CLI (`flightdeck release diff --tenant --task `) +or call `POST /v1/diff` directly with the `tenant_id` and `task_id` fields. See +[http-api.md § POST /v1/diff](http-api.md#post-v1diff) and +[operations-and-policy.md § compute_diff vs. promote_release filter scope](operations-and-policy.md#compute_diff-vs-promote_release--rollback_release-filter-scope) +for details on what those filters affect. + On submit, the raw diff response is parsed and rendered as: - **Summary card:** policy badge (PASS / FAIL), failure reasons list, sample counts and