Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/release-pypi.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Publish sdist + wheel to PyPI when a SemVer tag is pushed (e.g. v1.0.3).
# Publish sdist + wheel to PyPI when a SemVer tag is pushed (e.g. v1.0.4).
# Configure "trusted publishing" on PyPI for this workflow + repository + optional GitHub environment.
# https://docs.pypi.org/trusted-publishers/

Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,22 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0**

## Unreleased

## 1.0.4 - 2026-05-03

### Added

- **HTTP `GET /v1/metrics`:** read-only JSON counters for the local ledger (`releases_total`, `pricing_tables_total`, `run_events_total`, `promoted_pointers_total`, `actions_total`, `actions_by_action`) plus `schema_version` and `generated_at`; backed by **`Storage.get_ledger_counters()`**.
- **`POST /v1/diff`:** `pricing.prices` — per-side input/output/cached-input USD per 1k tokens for the resolved model (mirrors table entries; helps separate tariff changes from token volume).
- **CLI `release diff`:** when pricing or model differs, prints **Per-1k token prices** after the existing NOTE line.
- **Web UI (Run diff):** shows per-1k input/output price deltas under the pricing/model-change banner when those numbers are present.
- **Docs:** [examples/README.md](examples/README.md) operating walkthrough; [docs/http-api.md](docs/http-api.md) documents **`GET /v1/metrics`** and **`pricing.prices`**; [docs/cli.md](docs/cli.md) documents the new diff output line.
- **Tests:** **`test_v1_metrics_returns_counters`** in **`tests/test_server_health.py`**; **`POST /v1/diff`** `pricing.prices` assertions on cross-model diff in **`tests/test_spine.py`**.

### Changed

- **Roadmap:** **Next release** and **Phase 0 progress** updated for **v1.0.4** (pricing diagnostic, examples index, metrics endpoint).
- **Examples / CI snippets:** **`flightdeck-ai>=1.0.4`** in Docker and PyPI gate samples.

## 1.0.3 - 2026-05-03

### Added
Expand Down
2 changes: 1 addition & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Merging to **`main` does not publish packages** — PyPI uploads are **tag-drive
1. **PyPI:** add a **trusted publisher** for **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** — workflow **`release-pypi.yml`**. If PyPI offers **Environment name: (Any)**, you can still use a GitHub **Environment** named **`pypi`** for approval gates; otherwise match whatever you register on PyPI ([trusted publishers](https://docs.pypi.org/trusted-publishers/)).
2. **GitHub:** Settings → **Environments** → create **`pypi`** (optional: required reviewers / wait timer before OIDC publish).
3. Bump **`version`** in **`pyproject.toml`** and **`src/flightdeck/__init__.py`**, update **`CHANGELOG.md`**, merge to **`main`**.
4. **`git tag vX.Y.Z`** (must match **`pyproject.toml`** exactly, e.g. **`v1.0.3`**) then **`git push origin vX.Y.Z`**.
4. **`git tag vX.Y.Z`** (must match **`pyproject.toml`** exactly, e.g. **`v1.0.4`**) then **`git push origin vX.Y.Z`**.

The workflow runs **ruff**, **pytest**, schema drift, **`uv build`**, publishes **sdist + wheel** to **PyPI** via **OIDC** (no long-lived API token in repo secrets), enables **publish attestations**, and creates a **GitHub Release** with generated notes and **`dist/*`** assets.

Expand Down
4 changes: 4 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ High-level notes for **shipping FlightDeck**. Detailed history: **[CHANGELOG.md]

Narrative docs (including the CLI reference) are maintained on **[github.com/flightdeckdev/flightdeck](https://github.com/flightdeckdev/flightdeck)** `main`; this file and **`schemas/`** ship in minimal clones.

## v1.0.4 — Phase 0 closing slice (pricing diagnostic, examples index, metrics)

Patch release (see **[CHANGELOG.md](CHANGELOG.md)**): **`GET /v1/metrics`** exposes additive JSON counters for operators; **`POST /v1/diff`** and **`flightdeck release diff`** add **`pricing.prices`** / a **Per-1k token prices** line when pricing or model differs, so cost deltas are easier to interpret; **[examples/README.md](examples/README.md)** ties **integration**, **CI**, and **deploy** examples into one loop; web **Run diff** shows the same unit-price deltas when present. **Stable contracts:** additive HTTP and CLI output only; no **`v1`** payload or schema removals.

## v1.0.3 — Phase 0 hardening (tests + UI)

Patch release (see **[CHANGELOG.md](CHANGELOG.md)**): broader **pytest** coverage for **`diff_releases`** (MEDIUM/LOW confidence, **`max_latency_ms`**, **`max_error_rate`**, combined failures), **CLI** integration for MEDIUM confidence blocking promotion when **`require_high_diff_confidence`** is on, **`runs ingest`** edge cases (empty file, bad JSONL, JSON array file), and **multi-provider / cross-model** **`release diff`** plus **`POST /v1/diff`** parity on **`pricing.pricing_or_model_changed`**. **Web UI:** promote/rollback responses use structured panels (raw JSON optional); **Run diff** surfaces the same pricing/model-change note as the CLI when the diff payload flags it. **Stable contracts:** no CLI flag removals, no **`v1`** schema or **`POST /v1/events`** shape changes; **HTTP** diff and action response shapes are unchanged (additive UI only on the client).
Expand Down
19 changes: 11 additions & 8 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This roadmap is meant to be clear from **what is already shipped** to **near-ter

## Next release

**v1.0.3** (patch): Phase 0 hardening — expanded **pytest** coverage for diff confidence (MEDIUM/LOW, policy on latency and error rate), **runs ingest** edge cases (empty file, malformed JSONL, JSON array payload), and **multi-provider / cross-model** `release diff` paths; web UI structured **promote/rollback** outcome plus a **pricing/model changed** banner on **Run diff** when the API reports it. See **[CHANGELOG.md](CHANGELOG.md)** and **[RELEASE_NOTES.md](RELEASE_NOTES.md)**. No breaking changes to stable CLI, HTTP, or **`api_version` `v1`** contracts.
**v1.0.4** (patch): Phase 0 closing slice — **`GET /v1/metrics`** (JSON ledger counters); **`pricing.prices`** on **`POST /v1/diff`** plus CLI **Per-1k token prices** line and matching web diff detail when pricing/model changes; **[examples/README.md](examples/README.md)** end-to-end walkthrough linking **integration**, **CI**, and **deploy** examples. See **[CHANGELOG.md](CHANGELOG.md)** and **[RELEASE_NOTES.md](RELEASE_NOTES.md)**. No breaking changes to stable CLI, HTTP, or **`api_version` `v1`** contracts.

---

Expand Down Expand Up @@ -53,16 +53,19 @@ Goal: prove the wedge with real teams using FlightDeck as release governance sou
- Strengthen local security ergonomics: explicit token/env status in UI, mutation guardrails, optional read-only UX.
- Continue UI productization for current scope (structured views over raw JSON where stable).

### Phase 0 progress (toward v1.0.3)
### Phase 0 progress (v1.0.3–v1.0.4)

Shipped on **`main`** for the next patch:
Shipped on **`main`**:

- **Policy / diff tests:** `diff_releases` coverage for MEDIUM confidence vs `require_high_diff_confidence`, LOW sample floor boundaries, `max_latency_ms` (including skip when latency is absent), `max_error_rate`, and stacked policy failure reasons; CLI integration for MEDIUM blocking a second promotion after a baseline is established.
- **Ingest tests:** empty JSONL (zero inserts), malformed line (non-zero exit), JSON array file accepted.
- **Multi-provider pricing:** integration tests that diff baseline vs candidate releases with different **`pricing_reference`** providers (and same-provider different models), including parity checks on **`POST /v1/diff`** `pricing.pricing_or_model_changed`.
- **Web UI:** structured outcome card after promote/rollback (policy, pointer, IDs) with raw JSON in a collapsible panel; Diff summary shows pricing/model change when the server marks it.
- **Policy / diff tests (v1.0.3):** `diff_releases` coverage for MEDIUM confidence vs `require_high_diff_confidence`, LOW sample floor boundaries, `max_latency_ms` (including skip when latency is absent), `max_error_rate`, and stacked policy failure reasons; CLI integration for MEDIUM blocking a second promotion after a baseline is established.
- **Ingest tests (v1.0.3):** empty JSONL (zero inserts), malformed line (non-zero exit), JSON array file accepted.
- **Multi-provider pricing (v1.0.3):** integration tests that diff baseline vs candidate releases with different **`pricing_reference`** providers (and same-provider different models), including parity checks on **`POST /v1/diff`** `pricing.pricing_or_model_changed`.
- **Web UI (v1.0.3):** structured outcome card after promote/rollback (policy, pointer, IDs) with raw JSON in a collapsible panel; Diff summary shows pricing/model change when the server marks it.
- **Pricing diagnostics (v1.0.4):** **`pricing.prices`** on **`POST /v1/diff`** and matching CLI / web lines for per-1k input/output unit prices when pricing or model differs.
- **Operating narrative (v1.0.4):** **[examples/README.md](examples/README.md)** index tying emit → ingest → verify → diff/gate → promote → serve.
- **Observability foundation (v1.0.4):** **`GET /v1/metrics`** JSON counters over the local ledger (not Prometheus/OTel; longer arc stays mid term).

**Still open in Phase 0** (see gaps table and Phase 1 for larger items): richer **pricing normalization** product semantics (beyond per-side tables + flags), broader **integration** and **deployment** narrative in docs, and **observability** paths remain roadmap-sized rather than single-patch work.
**Still open in Phase 0** (see gaps table and Phase 1): **catalog-level** multi-provider normalization (single comparable unit across vendors), deeper **event pipeline** and **fleet** ergonomics, and **OTLP-oriented** telemetry remain beyond this patch.

### Phase-0 success signals

Expand Down
3 changes: 3 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,8 +244,11 @@ When pricing or model changes between baseline and candidate, an additional note
printed:
```
NOTE: cost delta includes pricing/model assumption changes (pricing reference and/or model differ).
Per-1k token prices: input 0.005000 -> 0.004500, output 0.015000 -> 0.013500
```

The **Per-1k token prices** line shows the resolved table entry for each side’s model (input and output USD per 1k tokens), so you can separate **tariff moves** from **token volume** changes in the cost delta.

See [operations-and-policy.md](operations-and-policy.md) for the cost calculation and
confidence algorithm.

Expand Down
37 changes: 35 additions & 2 deletions docs/http-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Two access tiers:
| Route | No token configured | `FLIGHTDECK_LOCAL_API_TOKEN` set |
|-------|--------------------|---------------------------------|
| `GET /health` | open | open |
| `GET /v1/*` (reads) | open | open |
| `GET /v1/*` (reads, including `GET /v1/metrics`) | open | open |
| `POST /v1/events` | open† | open (no Bearer required) |
| `POST /v1/diff` | open | open |
| `POST /v1/promote` | loopback only | `Authorization: Bearer <token>` required |
Expand Down Expand Up @@ -67,6 +67,31 @@ This field never includes secret material.

---

## `GET /v1/metrics`

Read-only JSON snapshot of aggregate counts in the local SQLite ledger (releases, pricing tables, run events, promotion pointers, audit actions). Intended for simple operators or scrapers; this is **not** Prometheus exposition format.

**Response**

```json
{
"counters": {
"releases_total": 3,
"pricing_tables_total": 1,
"run_events_total": 120,
"promoted_pointers_total": 1,
"actions_total": 5,
"actions_by_action": { "promote": 4, "rollback": 1 }
},
"schema_version": 3,
"generated_at": "2026-05-03T12:00:00+00:00"
}
```

`schema_version` matches the highest applied SQLite migration (`LATEST_SCHEMA_MIGRATION_VERSION` in `flightdeck.storage`).

---

## `GET /v1/releases`

List all registered releases.
Expand Down Expand Up @@ -297,7 +322,15 @@ the audit ledger.
"candidate_provider": "openai",
"candidate_version": "2024-05",
"candidate_model": "gpt-4o",
"pricing_or_model_changed": true
"pricing_or_model_changed": true,
"prices": {
"baseline_input_usd_per_1k_tokens": 0.005,
"baseline_output_usd_per_1k_tokens": 0.015,
"baseline_cached_input_usd_per_1k_tokens": null,
"candidate_input_usd_per_1k_tokens": 0.0045,
"candidate_output_usd_per_1k_tokens": 0.0135,
"candidate_cached_input_usd_per_1k_tokens": null
}
},
"samples": {
"baseline_runs": 1200,
Expand Down
22 changes: 22 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Examples index

This folder holds **copy-pasteable** references for wiring FlightDeck into a real loop: emit evidence, ingest, diff, gate in CI, promote, and run the local HTTP server. Narrative CLI and trust-boundary docs live on the [canonical repository](https://github.com/flightdeckdev/flightdeck) `main`; see also [RELEASE_NOTES.md](../RELEASE_NOTES.md) in this tree.

## End-to-end loop

1. **Emit run events** from your app or a test harness — see [integration/](integration/README.md) (`emit_sample_events.py` and `POST /v1/events` shape).
2. **Ingest** evidence: `flightdeck runs ingest <file.jsonl>` (or JSON array file), or HTTP `POST /v1/events` while `flightdeck serve` is running.
3. **Register** a release bundle: `flightdeck release register <bundle-dir>` then **`flightdeck release verify`** against the same tree before you trust the checksum.
4. **Diff and gate** in CI: `flightdeck release diff …` with **`--fail-on-policy`** when you want a non-zero exit without mutating promotion — see [ci/](ci/README.md) and `ledger_gate.py` / GitHub Actions templates.
5. **Promote or rollback** via CLI (`flightdeck release promote` / `rollback`) or HTTP `POST /v1/promote` and `POST /v1/rollback` (token + loopback rules apply).
6. **Run the server** in a container or compose stack — see [deploy/](deploy/README.md).
7. **Observe** aggregate ledger size with **`GET /v1/metrics`** (JSON counters; read-only, same access tier as other `GET /v1/*` routes).

## Subfolders

| Path | Purpose |
|------|---------|
| [quickstart/](quickstart/) | Minimal workspace used by `flightdeck-quickstart-verify`. |
| [ci/](ci/README.md) | Policy gate script, sample policy YAML, GitHub Actions job snippets. |
| [deploy/](deploy/README.md) | Dockerfile and compose for `flightdeck serve`. |
| [integration/](integration/README.md) | Sample event emitter for HTTP ingest. |
2 changes: 1 addition & 1 deletion examples/ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ uv run python examples/ci/ledger_gate.py
Example (**PyPI** install):

```bash
pip install "flightdeck-ai>=1.0.3"
pip install "flightdeck-ai>=1.0.4"
export WORKSPACE="$(mktemp -d)"
export QUICKSTART_ROOT=/path/to/flightdeck/examples/quickstart
python /path/to/flightdeck/examples/ci/ledger_gate.py
Expand Down
2 changes: 1 addition & 1 deletion examples/ci/github-actions/policy-gate-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
env:
# Pin to a tag or SHA that matches your installed flightdeck-ai version when possible.
FLIGHTDECK_REF: main
FLIGHTDECK_AI_SPEC: ">=1.0.3"
FLIGHTDECK_AI_SPEC: ">=1.0.4"

jobs:
ledger-gate:
Expand Down
2 changes: 1 addition & 1 deletion examples/deploy/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
FROM python:3.14-slim

RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir "flightdeck-ai>=1.0.3"
&& pip install --no-cache-dir "flightdeck-ai>=1.0.4"

WORKDIR /workspace

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "flightdeck-ai"
version = "1.0.3"
version = "1.0.4"
description = "AI Release Governance for production agents."
readme = "README.md"
license = "Apache-2.0"
Expand Down
2 changes: 1 addition & 1 deletion src/flightdeck/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""FlightDeck - AI Release Governance for production agents."""

__version__ = "1.0.3"
__version__ = "1.0.4"
13 changes: 13 additions & 0 deletions src/flightdeck/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,19 @@ def release_diff(
)
if result.pricing_or_model_changed:
click.echo("NOTE: cost delta includes pricing/model assumption changes (pricing reference and/or model differ).")
if (
result.baseline_input_usd_per_1k_tokens is not None
and result.candidate_input_usd_per_1k_tokens is not None
and result.baseline_output_usd_per_1k_tokens is not None
and result.candidate_output_usd_per_1k_tokens is not None
):
click.echo(
"Per-1k token prices: "
f"input {result.baseline_input_usd_per_1k_tokens:.6f} -> "
f"{result.candidate_input_usd_per_1k_tokens:.6f}, "
f"output {result.baseline_output_usd_per_1k_tokens:.6f} -> "
f"{result.candidate_output_usd_per_1k_tokens:.6f}"
)
click.echo(f"Samples: baseline={result.baseline_runs} candidate={result.candidate_runs}")
click.echo(
f"Confidence: {result.confidence}" + (f" ({result.confidence_reason})" if result.confidence_reason else "")
Expand Down
21 changes: 20 additions & 1 deletion src/flightdeck/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from typing import Literal
from uuid import uuid4

from flightdeck.ledger import diff_releases, parse_window
from flightdeck.ledger import diff_releases, parse_window, pricing_entry_for
from flightdeck.models import (
Policy,
PolicyResult,
Expand Down Expand Up @@ -33,9 +33,15 @@ class DiffOutcome:
baseline_pricing_provider: str
baseline_pricing_version: str
baseline_model: str
baseline_input_usd_per_1k_tokens: float | None
baseline_output_usd_per_1k_tokens: float | None
baseline_cached_input_usd_per_1k_tokens: float | None
candidate_pricing_provider: str
candidate_pricing_version: str
candidate_model: str
candidate_input_usd_per_1k_tokens: float | None
candidate_output_usd_per_1k_tokens: float | None
candidate_cached_input_usd_per_1k_tokens: float | None
pricing_or_model_changed: bool
baseline_runs: int
candidate_runs: int
Expand Down Expand Up @@ -184,6 +190,9 @@ def compute_diff(
except ValueError as e:
raise OperationError(str(e)) from e

base_entry = pricing_entry_for(base_table, base_artifact.spec.runtime.model)
cand_entry = pricing_entry_for(cand_table, cand_artifact.spec.runtime.model)

return DiffOutcome(
window=window,
since=since,
Expand All @@ -194,9 +203,19 @@ def compute_diff(
baseline_pricing_provider=base_ref.provider,
baseline_pricing_version=base_ref.pricing_version,
baseline_model=base_artifact.spec.runtime.model,
baseline_input_usd_per_1k_tokens=base_entry.input_usd_per_1k_tokens if base_entry else None,
baseline_output_usd_per_1k_tokens=base_entry.output_usd_per_1k_tokens if base_entry else None,
baseline_cached_input_usd_per_1k_tokens=(
base_entry.cached_input_usd_per_1k_tokens if base_entry else None
),
candidate_pricing_provider=cand_ref.provider,
candidate_pricing_version=cand_ref.pricing_version,
candidate_model=cand_artifact.spec.runtime.model,
candidate_input_usd_per_1k_tokens=cand_entry.input_usd_per_1k_tokens if cand_entry else None,
candidate_output_usd_per_1k_tokens=cand_entry.output_usd_per_1k_tokens if cand_entry else None,
candidate_cached_input_usd_per_1k_tokens=(
cand_entry.cached_input_usd_per_1k_tokens if cand_entry else None
),
pricing_or_model_changed=(
base_ref.provider != cand_ref.provider
or base_ref.pricing_version != cand_ref.pricing_version
Expand Down
2 changes: 2 additions & 0 deletions src/flightdeck/server/routes/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@

from flightdeck.server.routes.actions import router as actions_router
from flightdeck.server.routes.ingest import router as ingest_router
from flightdeck.server.routes.metrics import router as metrics_router
from flightdeck.server.routes.read import router as read_router


def include_routes(app: FastAPI) -> None:
app.include_router(ingest_router)
app.include_router(read_router)
app.include_router(metrics_router)
app.include_router(actions_router)
Loading
Loading