Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/release-pypi.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Publish sdist + wheel to PyPI when a SemVer tag is pushed (e.g. v1.0.4).
# Publish sdist + wheel to PyPI when a SemVer tag is pushed (e.g. v1.0.6).
# Configure "trusted publishing" on PyPI for this workflow + repository + optional GitHub environment.
# https://docs.pypi.org/trusted-publishers/

Expand Down
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,30 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0**

## Unreleased

## 1.0.6 - 2026-05-02

### Added

- **CLI `flightdeck doctor --backup PATH`:** SQLite online backup of the workspace database to **`PATH`** (parent directories created; file overwritten if present), then the usual doctor checks.
- **Examples:** **[examples/integration/emit_sample_events.node.mjs](examples/integration/emit_sample_events.node.mjs)** — **`POST /v1/events`** sample using built-in **`fetch`** (Node 18+); **[examples/integration/README.md](examples/integration/README.md)** adds **`curl`** + **`jq`** example.
- **Docs:** **[examples/deploy/README.md](examples/deploy/README.md)** — Compose **`/health`** healthcheck and **`doctor --backup`** / cron scheduling notes.
- **Roadmap:** **Phase 0** declared **closed**; **catalog-level** multi-provider pricing normalization called out under **Phase 1** build items.
- **Tests:** **`test_doctor_backup_writes_valid_sqlite`** in **`tests/test_cli.py`**.

### Changed

- **Examples / CI snippets:** **`flightdeck-ai>=1.0.6`** in Docker and PyPI gate samples.

## 1.0.5 - 2026-05-02

### Added

- **CLI `release diff --output json`:** prints the same JSON object as **`POST /v1/diff`** (sorted keys) for **`jq`** / CI parsers; works with **`--fail-on-policy`** (JSON to stdout, then exit **1** on policy failure).
- **`POST /v1/diff`:** **`pricing.warnings`** — string list when baseline or candidate **`spec.runtime.model`** has no row in that side's imported pricing table (diagnostic only; **`policy`** unchanged). CLI prints matching **`WARNING:`** lines in text mode.
- **Web UI:** **Run diff** shows pricing warnings above the pricing/model-change banner; **Overview** adds a **Ledger metrics** card (**`GET /v1/metrics`**).
- **Docs:** **[docs/cli.md](docs/cli.md)** and **[docs/http-api.md](docs/http-api.md)** document **`--output json`** and **`pricing.warnings`**.
- **Tests:** **`test_release_diff_output_json_shape`**, **`test_release_diff_pricing_warnings_when_model_not_in_table`** in **`tests/test_spine.py`**; **`test_release_diff_fail_on_policy_with_json_output`** in **`tests/test_cli_contract.py`**; **`test_http_v1_diff_pricing_warnings_when_model_missing`** and **`pricing.warnings`** assertion on the happy path in **`tests/test_server_actions.py`**.

## 1.0.4 - 2026-05-03

### Added
Expand Down
2 changes: 1 addition & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Merging to **`main` does not publish packages** — PyPI uploads are **tag-drive
1. **PyPI:** add a **trusted publisher** for **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** — workflow **`release-pypi.yml`**. If PyPI offers **Environment name: (Any)**, you can still use a GitHub **Environment** named **`pypi`** for approval gates; otherwise match whatever you register on PyPI ([trusted publishers](https://docs.pypi.org/trusted-publishers/)).
2. **GitHub:** Settings → **Environments** → create **`pypi`** (optional: required reviewers / wait timer before OIDC publish).
3. Bump **`version`** in **`pyproject.toml`** and **`src/flightdeck/__init__.py`**, update **`CHANGELOG.md`**, merge to **`main`**.
4. **`git tag vX.Y.Z`** (must match **`pyproject.toml`** exactly, e.g. **`v1.0.4`**) then **`git push origin vX.Y.Z`**.
4. **`git tag vX.Y.Z`** (must match **`pyproject.toml`** exactly, e.g. **`v1.0.6`**) then **`git push origin vX.Y.Z`**.

The workflow runs **ruff**, **pytest**, schema drift, **`uv build`**, publishes **sdist + wheel** to **PyPI** via **OIDC** (no long-lived API token in repo secrets), enables **publish attestations**, and creates a **GitHub Release** with generated notes and **`dist/*`** assets.

Expand Down
8 changes: 8 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ High-level notes for **shipping FlightDeck**. Detailed history: **[CHANGELOG.md]

Narrative docs (including the CLI reference) are maintained on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** `main`; this file and **`schemas/`** ship in minimal clones.

## v1.0.6 — Phase 0 closure (backup, cross-language emitters, roadmap)

Patch release (see **[CHANGELOG.md](CHANGELOG.md)**): **`flightdeck doctor --backup PATH`** performs a SQLite online backup of the workspace DB; **[examples/integration/](examples/integration/README.md)** gains **`curl`** and a **Node** **`emit_sample_events.node.mjs`** path for **`POST /v1/events`**; **[examples/deploy/README.md](examples/deploy/README.md)** documents the Compose **`/health`** healthcheck and backup scheduling. **ROADMAP:** **Phase 0** is **closed**; **catalog-level** multi-provider pricing normalization is an explicit **Phase 1** build item. **Stable contracts:** additive CLI flag and HTTP field **`pricing.warnings`** (from **v1.0.5**) remain backward-compatible.

## v1.0.5 — Diff JSON output, pricing warnings, metrics in Overview

Patch release (see **[CHANGELOG.md](CHANGELOG.md)**): **`flightdeck release diff --output json`** matches **`POST /v1/diff`** for machine consumers; **`pricing.warnings`** surfaces missing pricing-table rows for a release's resolved model (CLI **`WARNING:`** lines + web); **Overview** shows **`GET /v1/metrics`** counters. **Stable contracts:** additive only.

## v1.0.4 — Phase 0 closing slice (pricing diagnostic, examples index, metrics)

Patch release (see **[CHANGELOG.md](CHANGELOG.md)**): **`GET /v1/metrics`** exposes additive JSON counters for operators; **`POST /v1/diff`** and **`flightdeck release diff`** add **`pricing.prices`** / a **Per-1k token prices** line when pricing or model differs, so cost deltas are easier to interpret; **[examples/README.md](examples/README.md)** ties **integration**, **CI**, and **deploy** examples into one loop; web **Run diff** shows the same unit-price deltas when present. **Stable contracts:** additive HTTP and CLI output only; no **`v1`** payload or schema removals.
Expand Down
27 changes: 13 additions & 14 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,14 @@ This roadmap is meant to be clear from **what is already shipped** to **near-ter
- **Economic + operational governance:** immutable pricing imports, trusted `release diff`, policy-gated `promote` and `rollback`.
- **Audit trail:** promotion/rollback history with stable sequencing (`audit_seq`) and integrity checks via `doctor`.
- **Evidence ingestion:** `runs ingest` from JSONL/JSON arrays plus stable `POST /v1/events` contracts (`schemas/v1/`).
- **Local API + UI:** `flightdeck serve` routes and web UI (Overview, Diff, Promote) in `src/flightdeck/server/static/`.
- **Local API + UI:** `flightdeck serve` routes and web UI (Overview with ledger metrics, Diff, Promote) in `src/flightdeck/server/static/`.
- **SDK and tooling:** Python sync/async clients with retries/batching and `flightdeck-quickstart-verify`.

---

## Next release

**v1.0.4 is shipped.** See **[CHANGELOG.md](CHANGELOG.md)** for the full list of additions. The
v1.0.4 slice delivered: `GET /v1/metrics` (JSON ledger counters), `pricing.prices` on
`POST /v1/diff`, CLI **Per-1k token prices** output, matching web diff banner detail, and the
**[examples/README.md](examples/README.md)** end-to-end walkthrough.

**v1.0.5 / next patch:** candidates include documentation completeness improvements (time-window
semantics, error message catalog, checksum format, tenant/task filter scope in UI), and
continued Phase 0 hardening. No breaking changes expected to stable CLI, HTTP, or
**`api_version` `v1`** contracts.
**v1.0.6** (patch): Phase 0 closure — **`flightdeck release diff --output json`** (same shape as **`POST /v1/diff`**); **`pricing.warnings`** when a release model has no row in its pricing table (CLI **`WARNING:`** lines + web Diff); **Overview** ledger metrics card (**`GET /v1/metrics`**); **`curl`** + **Node** samples under **[examples/integration/](examples/integration/README.md)**; **`flightdeck doctor --backup PATH`** (SQLite online backup); **[examples/deploy/](examples/deploy/README.md)** documents Compose **`/health`** healthcheck and backup scheduling. **Phase 0** is declared **closed**; **catalog-level** multi-provider normalization moves to **Phase 1**. See **[CHANGELOG.md](CHANGELOG.md)** and **[RELEASE_NOTES.md](RELEASE_NOTES.md)**. No breaking changes to stable CLI, HTTP, or **`api_version` `v1`** contracts.

---

Expand Down Expand Up @@ -57,11 +49,11 @@ Goal: prove the wedge with real teams using FlightDeck as release governance sou

- Harden CLI/schema contracts and edge-case policy coverage (sample windows, sparse traffic, error paths).
- Add concrete integration references: app runtime event emitters, CI pipeline examples, and deployment recipes for `flightdeck serve`.
- Improve pricing normalization for multiple provider inputs while keeping diff semantics stable.
- **Catalog-level cross-vendor pricing normalization** — deferred to **Phase 1** (see Phase 1 build list). **v1.0.4–v1.0.6** ship per-side **`pricing.prices`** and **`pricing.warnings`** diagnostics only.
- Strengthen local security ergonomics: explicit token/env status in UI, mutation guardrails, optional read-only UX.
- Continue UI productization for current scope (structured views over raw JSON where stable).

### Phase 0 progress (v1.0.3–v1.0.4)
### Phase 0 progress (v1.0.3–v1.0.6)

Shipped on **`main`**:

Expand All @@ -72,8 +64,14 @@ Shipped on **`main`**:
- **Pricing diagnostics (v1.0.4):** **`pricing.prices`** on **`POST /v1/diff`** and matching CLI / web lines for per-1k input/output unit prices when pricing or model differs.
- **Operating narrative (v1.0.4):** **[examples/README.md](examples/README.md)** index tying emit → ingest → verify → diff/gate → promote → serve.
- **Observability foundation (v1.0.4):** **`GET /v1/metrics`** JSON counters over the local ledger (not Prometheus/OTel; longer arc stays mid term).
- **Diff ergonomics (v1.0.5):** **`flightdeck release diff --output json`**; **`pricing.warnings`** on **`POST /v1/diff`** / CLI / web when the release model is missing from the imported pricing table; **Overview** shows key **`GET /v1/metrics`** counters.
- **Operator + pipeline breadth (v1.0.6):** **`curl`** and **Node** **`emit_sample_events.node.mjs`** under **[examples/integration/](examples/integration/README.md)**; **`flightdeck doctor --backup`**; deploy README covers healthcheck + backup scheduling.

### Phase 0 status

**Phase 0 is closed** as of **v1.0.6** for the local-first wedge (immutable releases, evidence ingest, diff + policy gate, promote/rollback, audit, CI/deploy/integration references, metrics, diagnostics, and operator backup ergonomics).

**Still open in Phase 0** (see gaps table and Phase 1): **catalog-level** multi-provider normalization (single comparable unit across vendors), deeper **event pipeline** and **fleet** ergonomics, and **OTLP-oriented** telemetry remain beyond this patch.
**Carried forward to Phase 1** (see gaps table): **catalog-level** multi-provider pricing normalization (single comparable unit across vendors), deeper **fleet** ergonomics, and **OTLP-oriented** telemetry — not blocking further patch releases on the Phase 0 spine.

### Phase-0 success signals

Expand All @@ -91,7 +89,8 @@ Goal: move from solid local tooling to repeatable production usage patterns.
### Build in this phase

- Human-in-the-loop approval workflow on top of policy gates (without requiring a hosted control plane).
- Stronger multi-provider pricing normalization and clearer mismatch diagnostics.
- **Catalog-level multi-provider pricing normalization** — single comparable tariff unit across vendors; additive to today's per-provider **`pricing import`** tables and **`pricing.prices`** / **`pricing.warnings`** diagnostics.
- Stronger mismatch diagnostics beyond table row presence (for example version skew hints) as needed for the catalog work.
- Incident forensics improvements (replay/trace-style analysis over ingested evidence) as governance support tooling.
- Deployment hardening artifacts (for example Helm or equivalent) if a blessed server topology is chosen.
- Multi-workspace operator ergonomics (naming, templates, reproducible setup patterns).
Expand Down
13 changes: 11 additions & 2 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,14 @@ in a shared repo. See [release-artifact.md § Workspace config](release-artifact
Run read-only health checks on the local SQLite ledger.

```bash
flightdeck doctor
flightdeck doctor [--backup PATH]
```

No flags. Calls `Storage.migrate()` at start (idempotent) and then checks:
Calls `Storage.migrate()` at start (idempotent). With **`--backup PATH`**, runs an SQLite
online backup of the workspace database to **`PATH`** (parent directories are created;
an existing file is overwritten), then runs the checks below.

Without **`--backup`**, only the checks run. In both cases **`migrate()`** runs first.

| Check | What it verifies |
|-------|-----------------|
Expand Down Expand Up @@ -216,11 +220,16 @@ flightdeck release diff BASELINE_ID CANDIDATE_ID --window WINDOW [OPTIONS]
| `--tenant` | Filter events by `tenant_id` |
| `--task` | Filter events by `task_id` |
| `--fail-on-policy` | After printing the diff, exit **1** when the active policy does not pass (for CI gates). |
| `--output` | `text` (default) or `json`. **`json`**: same JSON object as **`POST /v1/diff`** (stable keys for `jq` / CI parsers). With **`--fail-on-policy`**, JSON is still printed to stdout before exit **1**. |

Both releases must have the same `agent_id`. Cross-agent diffs are rejected with exit 1.

**Exit codes:** invalid input, missing pricing, or other `OperationError` → non-zero. With **`--fail-on-policy`**, a computed diff whose policy result is **FAIL** also exits **1** (after the usual stdout).

When a release's resolved model has **no row** in its pricing table, the diff still completes
(if rollups do not need that rate for ingested events), and the CLI prints **`WARNING:`** lines
and JSON includes **`pricing.warnings`** — diagnostic only; policy is unchanged.

The diff is a **read-only computation** — it does not write to the audit ledger or update
any promoted pointers.

Expand Down
9 changes: 8 additions & 1 deletion docs/http-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,8 @@ included.
"candidate_input_usd_per_1k_tokens": 0.0045,
"candidate_output_usd_per_1k_tokens": 0.0135,
"candidate_cached_input_usd_per_1k_tokens": null
}
},
"warnings": []
},
"samples": {
"baseline_runs": 1200,
Expand Down Expand Up @@ -379,6 +380,12 @@ included.
}
```

**`pricing.warnings`** — array of human-readable strings when the baseline or candidate
release's **`spec.runtime.model`** has no matching entry in that side's imported pricing
table. Per-side **`prices.*`** fields are **`null`** in that case. Warnings are **informational
only** and do not change **`policy`**. If ingested run events reference a model that cannot
be priced, the diff request still fails with HTTP 400 as before.

**Confidence levels**

| Label | Meaning |
Expand Down
17 changes: 14 additions & 3 deletions docs/operations-and-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,11 @@ following differ between baseline and candidate:
These fields are populated by `pricing_entry_for(table, model)` in `flightdeck.ledger` after
`diff_releases` returns and before the `DiffOutcome` is constructed.

`DiffOutcome.pricing_warnings` is a tuple of human-readable strings when the release artifact's
`spec.runtime.model` has **no** matching row in that side's imported pricing table. Warnings are
**diagnostic only** (they do not change `policy`). If ingested events reference a model that
cannot be priced, `compute_rollup` still raises and `compute_diff` surfaces that as before.

**CLI output** — when `pricing_or_model_changed` is `True`, the CLI prints:

```
Expand All @@ -200,8 +205,12 @@ Per-1k token prices: input 0.005000 -> 0.004500, output 0.015000 -> 0.013500
The **Per-1k token prices** line is only printed when both input and output rates are present
for both sides. If any rate is `None`, that line is omitted.

When `pricing_warnings` is non-empty, the CLI also prints one **`WARNING:`** line per string
before the `NOTE:` / per-1k lines.

**HTTP API** — `/v1/diff` includes a `pricing.prices` object alongside the existing
`pricing_or_model_changed` flag:
`pricing_or_model_changed` flag and a `pricing.warnings` string array (empty when both models
resolve to a table row):

```json
"pricing": {
Expand All @@ -219,14 +228,16 @@ for both sides. If any rate is `None`, that line is omitted.
"candidate_input_usd_per_1k_tokens": 0.0045,
"candidate_output_usd_per_1k_tokens": 0.0135,
"candidate_cached_input_usd_per_1k_tokens": null
}
},
"warnings": []
}
```

`pricing.prices` is always present in the response (not gated on `pricing_or_model_changed`).
Fields are `null` when the rate is not set in the pricing table.

**Web UI** — the `DiffPage` `fd-alert--warn` banner shows the per-1k input/output price deltas
**Web UI** — the `DiffPage` shows `pricing.warnings` as a warning list when non-empty, then the
`fd-alert--warn` banner for `pricing_or_model_changed` when applicable, and the per-1k input/output price deltas
(baseline → candidate) when all four rates are present. See [web-ui.md § DiffPage](web-ui.md).

This is an informational signal — the diff still computes and the policy still evaluates; cost
Expand Down
9 changes: 6 additions & 3 deletions docs/web-ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The app uses **HashRouter** (`react-router-dom`) so all navigation stays within

| Hash path | Component | HTTP calls | Notes |
|-----------|-----------|-----------|-------|
| `#/` | `OverviewPage` | `GET /v1/releases`, `GET /v1/promoted`, `GET /v1/actions` (parallel) | |
| `#/` | `OverviewPage` | `GET /v1/releases`, `GET /v1/promoted`, `GET /v1/actions`, `GET /v1/metrics` (parallel where applicable) | Ledger metrics card is read-only counters |
| `#/diff` | `DiffPage` | `POST /v1/diff` | |
| `#/actions` | `ActionsPage` | `POST /v1/promote` or `POST /v1/rollback` | Redirects to `#/` when `VITE_FLIGHTDECK_UI_READ_ONLY=true` |
| `#/*` (any other) | — | Redirects to `#/` | |
Expand Down Expand Up @@ -121,10 +121,11 @@ fail. This is a configuration hint only — the server enforces the actual gate.

## `OverviewPage` (`web/src/pages/OverviewPage.tsx`)

Read-only dashboard. Renders three tables from `loadTimeline()` output:
Read-only dashboard. Renders a **Ledger metrics** card from `fetchMetrics()` plus three tables from `loadTimeline()` output:

| Table | Source | Columns |
| Block | Source | Content |
|-------|--------|---------|
| Ledger metrics | `GET /v1/metrics` | Releases, pricing tables, run events, promoted pointers, and actions totals (plus `actions_by_action` breakdown), `schema_version`, `generated_at` |
| Releases | `GET /v1/releases` | Release ID, Agent, Version, Environment, Checksum, Created |
| Promoted | `GET /v1/promoted` | Agent, Environment, Active release |
| Recent actions | `GET /v1/actions` | When, Action, Policy (PASS/FAIL badge), Release, Environment, Reason |
Expand Down Expand Up @@ -160,6 +161,8 @@ On submit, the raw diff response is parsed and rendered as:

- **Summary card:** policy badge (PASS / FAIL), failure reasons list, sample counts and
confidence label (including `confidence_reason` when present).
- **Pricing table warnings:** when `pricing.warnings` is a non-empty string array, a
`fd-alert--warn` list is shown above the pricing/model-change banner (diagnostic only).
- **Pricing change warning:** when the diff response includes a `pricing` block with
`pricing_or_model_changed: true`, a `fd-alert--warn` banner is shown in the summary
card. It names the baseline and candidate provider/version/model so the user knows the
Expand Down
2 changes: 1 addition & 1 deletion examples/ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ uv run python examples/ci/ledger_gate.py
Example (**PyPI** install):

```bash
pip install "flightdeck-ai>=1.0.4"
pip install "flightdeck-ai>=1.0.6"
export WORKSPACE="$(mktemp -d)"
export QUICKSTART_ROOT=/path/to/flightdeck/examples/quickstart
python /path/to/flightdeck/examples/ci/ledger_gate.py
Expand Down
2 changes: 1 addition & 1 deletion examples/ci/github-actions/policy-gate-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
env:
# Pin to a tag or SHA that matches your installed flightdeck-ai version when possible.
FLIGHTDECK_REF: main
FLIGHTDECK_AI_SPEC: ">=1.0.4"
FLIGHTDECK_AI_SPEC: ">=1.0.6"

jobs:
ledger-gate:
Expand Down
Loading
Loading