flightdeckdev · Gsbreddy · May 2, 2026 · May 2, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -63,7 +63,7 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0**
 
 ### Changed
 
-- **Slim distribution:** this repository omits the full in-tree **`docs/`** tree, org mirror scripts, and **`verify-repo-standards`** wrappers. Narrative docs and maintainer runbooks live on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)**; in-repo links now point there where applicable.
+- **Slim distribution:** this repository ships a focused in-tree **`docs/`** tree (CLI, HTTP API, SDK, operations/policy, release artifact, web UI references); org mirror scripts and **`verify-repo-standards`** wrappers are not included. Extended maintainer runbooks and the canonical README live on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)**; in-repo links now point there where applicable.
 - **`pyproject.toml`:** OpenTelemetry packages are **optional** only (**`telemetry`** / **`all`** extras); the default install matches the **1.0.0** dependency story (core does not import OpenTelemetry).
 - **`.pre-commit-config.yaml`:** **ruff** replaces **black** / **isort**; **`ruff-pre-commit`** pinned to **v0.15.12** to match **`dev`** (**`ruff==0.15.12`**).
 - **CI:** Python **3.13** and **3.14** added to the Ubuntu and Windows matrices (superseded by **3.14**-only policy as of **1.0.2**).

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -21,7 +21,15 @@ This roadmap is meant to be clear from **what is already shipped** to **near-ter
 
 ## Next release
 
-**v1.0.4** (patch): Phase 0 closing slice — **`GET /v1/metrics`** (JSON ledger counters); **`pricing.prices`** on **`POST /v1/diff`** plus CLI **Per-1k token prices** line and matching web diff detail when pricing/model changes; **[examples/README.md](examples/README.md)** end-to-end walkthrough linking **integration**, **CI**, and **deploy** examples. See **[CHANGELOG.md](CHANGELOG.md)** and **[RELEASE_NOTES.md](RELEASE_NOTES.md)**. No breaking changes to stable CLI, HTTP, or **`api_version` `v1`** contracts.
+**v1.0.4 is shipped.** See **[CHANGELOG.md](CHANGELOG.md)** for the full list of additions. The
+v1.0.4 slice delivered: `GET /v1/metrics` (JSON ledger counters), `pricing.prices` on
+`POST /v1/diff`, CLI **Per-1k token prices** output, matching web diff banner detail, and the
+**[examples/README.md](examples/README.md)** end-to-end walkthrough.
+
+**v1.0.5 / next patch:** candidates include documentation completeness improvements (time-window
+semantics, error message catalog, checksum format, tenant/task filter scope in UI), and
+continued Phase 0 hardening. No breaking changes expected to stable CLI, HTTP, or
+**`api_version` `v1`** contracts.
 
 ---
 

diff --git a/docs/http-api.md b/docs/http-api.md
@@ -105,13 +105,18 @@ List all registered releases.
       "agent_id": "agent_support",
       "version": "1.2.0",
       "environment": "production",
-      "checksum": "sha256:...",
+      "checksum": "a3f1c2e4b5d6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2",
       "created_at": "2026-05-01T12:00:00+00:00"
     }
   ]
 }
 ```
 
+`checksum` is a **64-character lowercase hex string** (raw SHA-256; no `sha256:` prefix). The
+same value is printed with a `sha256=` label by `flightdeck release show` and
+`flightdeck release verify` for human readability, but the stored and returned value is the
+bare hex.
+
 ---
 
 ## `GET /v1/promoted`
@@ -137,13 +142,16 @@ List the currently promoted release for each `agent_id` / `environment` pair.
 
 List promotion and rollback actions from the audit ledger.
 
+Results are returned **newest first** (`ORDER BY created_at DESC`), so the most recent action
+is always the first element in the array.
+
 **Query parameters**
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `agent` | string | — | Filter by `agent_id` |
 | `env` | string | — | Filter by environment |
-| `limit` | integer | 50 | Max records returned (1–500) |
+| `limit` | integer | 50 | Max records returned (1–500); server enforces a minimum of 1 and a maximum of 500 |
 
 **Response**
 ```json
@@ -229,7 +237,11 @@ produce an error.
 
 **Errors**
 - HTTP 400 — unsupported `api_version` value, or a field in a `RunEvent` fails type/range
-  validation after the per-event `api_version` check.
+  validation after the per-event `api_version` check. Field validation errors include
+  the prefix `"Invalid RunEvent: "` in the `detail` string, e.g.
+  `"Invalid RunEvent: 1 validation error for RunEvent …"`. Client code that parses
+  error messages can key off this prefix to distinguish per-event validation failures
+  from `api_version` rejections.
 - HTTP 422 — `events` array is empty or the request body does not match the expected shape
   (Pydantic validation error; returned as an array under `detail`).
 
@@ -301,9 +313,18 @@ the audit ledger.
 }
 ```
 
-`window` format: `{N}d` (days), `{N}h` (hours), `{N}m` (minutes). Required.
+`window` format: `{N}d` (days), `{N}h` (hours), `{N}m` (minutes). Required. `N` must be
+a positive integer. Seconds and weeks are not supported. Examples: `"7d"`, `"24h"`,
+`"30m"`. Invalid formats return HTTP 400.
+
 `environment` defaults to `WorkspaceConfig.default_environment` when `null`.
 
+**Time-window semantics:** run events are queried with `timestamp >= since AND timestamp <
+until`. The interval is **half-open**: `since` is inclusive, `until` is exclusive. Both
+boundaries are in UTC. `until` is set to the server's clock at the moment the request is
+processed; `since` is `until - window_delta`. An event exactly at `until` is **not**
+included.
+
 **Response**
 ```json
 {

diff --git a/docs/operations-and-policy.md b/docs/operations-and-policy.md
@@ -86,7 +86,11 @@ compute_diff(
 3. Load the pricing table for each release (provider + pricing_version from
    `spec.pricing_reference`). Missing tables raise `OperationError` with a hint to run
    `flightdeck pricing import`.
-4. Parse `window` into a `timedelta`; compute `since = now - delta`, `until = now`.
+4. Parse `window` into a `timedelta` via `ledger.parse_window`. Valid units are `d`
+   (days), `h` (hours), and `m` (minutes) — seconds and weeks are not supported. The
+   numeric part must be a positive integer; `"0h"`, `"-7d"`, and `"7w"` all raise
+   `OperationError`. Compute `since = until - delta`, `until = now` (UTC at call time).
+   Events are queried with `timestamp >= since AND timestamp < until` (half-open interval).
 5. Query `run_events` for each release ID filtered by environment, tenant, task, and the
    time window.
 6. Call `ledger.diff_releases` to compute per-side rollups (cost, latency, error rate),
@@ -145,6 +149,24 @@ This can happen if `run_id` values from different agents were ingested under the
 `release_id`. Ensure every `RunEvent` for a release carries the correct `agent_id`
 matching `spec.agent.agent_id` in the release artifact.
 
+### Diffs where one side has no run events
+
+`diff_releases` only runs the cross-agent agent consistency check when **both** sides
+have events. If one side (or both) has zero events in the window, the consistency check is
+skipped. The rollup for the empty side evaluates to zero runs, zero cost, no latency data,
+and zero error rate. Confidence is determined by the sample count thresholds as normal:
+
+- With default thresholds (`min_candidate_runs=500`, `min_baseline_runs=500`,
+  `min_low_runs=50`), a baseline with zero runs will produce `LOW` confidence.
+- With all thresholds set to `0` (staging policy), zero events on either side can reach
+  `HIGH` confidence.
+
+**Practical implication:** if you register a new baseline with no run history and
+immediately diff it against a candidate, the diff will complete without error, but
+`baseline_runs` will be 0 and confidence will be `LOW` (or lower than `HIGH` with default
+thresholds). This is a valid signal — it means the baseline has no observable data to
+compare against.
+
 ### Pricing and model change detection
 
 `DiffOutcome` includes a `pricing_or_model_changed` flag that is `True` when any of the
@@ -525,13 +547,18 @@ corresponding check in `test_schemas.py` (or `test_doctor.py`).
 
 | Error | Cause | Fix |
 |-------|-------|-----|
-| `Unknown baseline release: rel_...` | Release not registered | `flightdeck release register <path>` |
-| `Missing pricing table for baseline openai/2024-02` | Pricing not imported | `flightdeck pricing import <path>` |
+| `Unknown baseline release: rel_...` | Baseline release ID not registered | `flightdeck release register <path>` |
+| `Unknown candidate release: rel_...` | Candidate release ID not registered | `flightdeck release register <path>` |
+| `Missing pricing table for baseline openai/2024-02` | Pricing not imported for baseline provider/version | `flightdeck pricing import <path>` |
+| `Missing pricing table for candidate openai/2024-02` | Pricing not imported for candidate provider/version | `flightdeck pricing import <path>` |
+| `Missing pricing table for rollback target openai/2024-02` | Pricing not imported for promote/rollback target | `flightdeck pricing import <path>` |
+| `Missing pricing table for promoted_baseline openai/2024-02` | Pricing for the currently-promoted baseline is not present | Import the missing table with `flightdeck pricing import <path>` |
 | `Cross-agent diff is not allowed` | Releases belong to different agents | Use releases from the same `agent_id` |
 | `Each side of the diff must have a single consistent agent_id among run events` | Ingested events for that release contain mixed `agent_id` values | Verify all `RunEvent` records use the correct `agent_id` matching the release artifact; re-ingest corrected events |
 | `Pricing table missing model entry` | Pricing table does not list the model used in the release | Add the model to the pricing YAML and reimport with `--replace` |
 | `Reason is required for promote/rollback actions` | Empty `--reason` flag | Provide a non-empty `--reason` |
 | `No promoted release exists for this agent/environment; nothing to roll back to` | Trying to roll back with no baseline | Promote a release first |
+| `Promoted baseline release is missing: rel_...` | A promoted pointer exists but the referenced release record is gone (e.g. manual DB edit) | Restore from backup; then re-register the release if the artifact is available and promote it to reset the pointer |
 | `Workspace config not found: flightdeck.yaml` | Missing `flightdeck.yaml` | `flightdeck init` |
 
 ---

diff --git a/docs/web-ui.md b/docs/web-ui.md
@@ -149,6 +149,13 @@ Form-based interface for `POST /v1/diff`. Fields mirror the request body:
 | Window | `7d` | `window` |
 | Environment | `local` | `environment` (sent as `null` when empty) |
 
+`tenant_id` and `task_id` are **not exposed** in the UI form. To run a diff narrowed to a
+specific tenant or task, use the CLI (`flightdeck release diff --tenant <id> --task <id>`)
+or call `POST /v1/diff` directly with the `tenant_id` and `task_id` fields. See
+[http-api.md § POST /v1/diff](http-api.md#post-v1diff) and
+[operations-and-policy.md § compute_diff vs. promote_release filter scope](operations-and-policy.md#compute_diff-vs-promote_release--rollback_release-filter-scope)
+for details on what those filters affect.
+
 On submit, the raw diff response is parsed and rendered as:
 
 - **Summary card:** policy badge (PASS / FAIL), failure reasons list, sample counts and