Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ jobs:
- name: Playwright E2E (approval workspace)
env:
FD_E2E_FORCE_APPROVAL: "1"
PW_WEBSERVER_APPROVAL: "1"
run: |
cd web
npx playwright install chromium
Expand Down Expand Up @@ -127,7 +126,6 @@ jobs:
shell: bash
env:
FD_E2E_FORCE_APPROVAL: "1"
PW_WEBSERVER_APPROVAL: "1"
run: |
cd web
npx playwright install chromium
Expand Down
17 changes: 7 additions & 10 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,16 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0**

### Changed

- **Web Runs:** forensics UX — empty / offset / truncation messaging, export copy aligned to server limits, trace band rows, **View** drawer with structured fields and full event JSON, extra table columns (trace, status).
- **Web Diff:** scannable sections (policy, evidence window, pricing/catalog/hints, rollups), pre-query hint, `evaluated_at` when present; **examples** index and **integration** README link **`/#/diff`** and **`POST /v1/diff`** to the end-to-end loop.
- **Web Actions:** workspace loading skeleton; numbered steps when approval is on; pending table **Use for confirm** and **Refresh list**; clearer browser confirm copy and approval-reason placeholder.
- **Web shell / Overview:** skeleton loading instead of plain “Loading…”; **Refresh** disabled while loading; ledger metrics line with links to **Diff** and **Runs**; per-metric hint lines; Diff query card **`aria-busy`** while computing.
- **Web Runs:** optional **Group by trace_id** (collapsible `<details>` per trace); **View** uses **`aria-haspopup="dialog"`**.
- **Web Diff:** warn when imported **pricing table versions** or **providers** differ between baseline and candidate (same `pricing` block as before).
- **Web security strip:** loading state with **`aria-busy`** while **`/health`** is fetched.
- **Web Actions:** **Rollback** uses danger-styled button (still same confirm + API).
- **Examples / deploy / SECURITY:** [examples/README.md](examples/README.md) step 7 and readiness row mention **`/#/runs`** grouping; [examples/deploy/README.md](examples/deploy/README.md) operator checklist; Compose **`restart: unless-stopped`** on the reference service; **[SECURITY.md](SECURITY.md)** links the deploy guide for operational hardening.
- **Playwright:** `e2e-server.mjs` enables **`promotion_requires_approval`** only when **`PW_FORCE_APPROVAL_WORKSPACE=1`** (set from the CLI target or **`PW_WEBSERVER_APPROVAL`**); default suite no longer breaks on a stray **`FD_E2E_FORCE_APPROVAL`** shell export; **`reuseExistingServer: false`** for a clean workspace each run; **[web/README.md](web/README.md)** documents approval vs default runs.
- **Web Runs:** forensics — empty / offset / truncation messaging, export copy, trace band rows or **Group by trace_id**, **View** drawer (structured fields + full JSON, **session_id** / **span_id**, focus trap + return focus, **`aria-haspopup="dialog"`**), trace/status columns; **run-query** failures show a typed error card with **Retry**.
- **Web Diff:** scannable sections (policy, evidence window, pricing/catalog/hints, rollups), pre-query hint, `evaluated_at` when present; warn when imported **pricing table versions** or **providers** differ baseline vs candidate.
- **Web Actions:** workspace loading skeleton; numbered approval steps; pending **Refresh list** / **Use for confirm**; clearer confirms; approval-reason placeholder; **Rollback** danger-styled; **Actions** shows whether **`VITE_FLIGHTDECK_LOCAL_API_TOKEN`** is set (no value) and an inline hint when the server uses **Bearer** and the UI token is missing.
- **Web shell / Overview / CSS:** **Langfuse-style** left sidebar + main column (stacks on narrow viewports); skeleton loading on first load; **Overview** auto-polls timeline + metrics every **30s** when the tab is visible (silent refresh; no manual **Refresh** button); updates after **Actions** mutations via context; ledger metrics hints + links to **Diff** / **Runs**; Diff query **`aria-busy`**; **Security strip** `/health` loading + **Bearer** + client-token reassurance line; shared **focus-visible** / type scale / narrow breakpoints; **skip to main** (HashRouter-safe); **[ROADMAP.md](ROADMAP.md)** adds **Visual system** backlog item and theme deferral.
- **Examples / deploy / SECURITY / web README:** [examples/README.md](examples/README.md) end-to-end loop + **UI polish / operator flow** blurb; deploy checklist + **`restart: unless-stopped`**; **[SECURITY.md](SECURITY.md)** deploy pointer; **[web/README.md](web/README.md)** Playwright approval vs default runs.
- **Playwright:** `e2e-server.mjs` gates approval workspace on **`PW_FORCE_APPROVAL_WORKSPACE`** (set from config); **`reuseExistingServer: false`**; config sets approval workspace only when the CLI lists **exactly one** `e2e/*.spec.ts` path and it is **`actions-approval.spec.ts`** (avoids multi-spec argv; **`PW_WEBSERVER_APPROVAL`** no longer toggles the server so a stale value cannot break **`npm run test:e2e`**); **`actions-approval.spec.ts`** skips when **`GET /v1/workspace`** shows approval off (e.g. full suite with **`FD_E2E_FORCE_APPROVAL=1`**).

### Added

- **PostgreSQL ledger:** optional **`database_url`** in **`flightdeck.yaml`** (`postgresql://` or `postgres://`); install **`psycopg`** with **`uv sync --extra postgres`** (or **`pip install 'flightdeck-ai[postgres]'`**). Same schema migrations and API behavior as SQLite; run filters use **`::json`** predicates on **`event_json`**. **`flightdeck doctor --backup`** stays SQLite-only (use **`pg_dump`** for Postgres). Optional integration tests: **`FLIGHTDECK_TEST_POSTGRES_URL`** with the **`postgres`** extra.
- **`GET /v1/runs/export`** — NDJSON stream of the same filtered slice as **`GET /v1/runs`** (optional response headers when truncated).
- **`session_id`** / **`span_id`** query filters on **`GET /v1/runs`**, matching CLI/SDK, and **`offset`** pagination on run listings (with **`runs list`** / **`runs export`**).
- **Web Runs** page — query **`GET /v1/runs`** from the bundled UI.
Expand Down
5 changes: 4 additions & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,10 @@ uv sync --extra dev

This creates **`.venv/`** (gitignored), installs **`flightdeck`** editable plus **pytest** and **ruff**, and pins versions from **`uv.lock`**.

Optional extras (telemetry, SDK helpers): e.g. **`uv sync --extra dev --extra telemetry`**.
Optional extras (telemetry, SDK helpers, PostgreSQL driver): e.g.
**`uv sync --extra dev --extra telemetry`** or **`uv sync --extra dev --extra postgres`**
for **`database_url`** / optional **`tests/test_storage_postgres.py`** runs (**`FLIGHTDECK_TEST_POSTGRES_URL`**).
Local driver test helper (Docker optional): **`scripts/run_postgres_tests.ps1`** (see script header).

### Package extras

Expand Down
3 changes: 2 additions & 1 deletion ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,11 @@ These map to **What is next** items **1**, **2**, and **5**; ship notes stay in
4. **Overview and trust** — Metrics **context** (what a counter means), light cross-links to Diff/Runs—not a metrics dashboard product.
5. **Shell and quality bar** — **Loading** states, consistent spacing and type rhythm, keyboard **focus** and labels, layouts that tolerate narrow viewports where cheap.
6. **Security ergonomics (UI)** — Token/env/mutation visibility, read-only build behavior, cautious affordances for destructive actions.
7. **Visual system** — Shared typography scale, spacing rhythm, **focus-visible** affordances, and narrow-layout breakpoints so the operator surfaces stay legible without a separate design system product.

**Explicit UI deferrals**

Out of scope for the near-term web app: theme marketplaces; embedded arbitrary log viewers; full observability or fleet consoles in the browser; multi-workspace UI (follows conditional **Fleet / cross-workspace** in **What is next**).
Out of scope for the near-term web app: custom themes or theme marketplaces; embedded arbitrary log viewers; full observability or fleet consoles in the browser; multi-workspace UI (follows conditional **Fleet / cross-workspace** in **What is next**).

---

Expand Down
13 changes: 9 additions & 4 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,21 +62,26 @@ Wrote flightdeck.yaml
```

The generated file uses all defaults. Edit `diff.*` thresholds or `db_path` before using
in a shared repo. See [release-artifact.md § Workspace config](release-artifact.md).
in a shared repo. For **PostgreSQL**, set **`database_url`** to a `postgresql://…` (or
`postgres://…`) DSN and install **`psycopg`** (`uv sync --extra postgres`); **`db_path`**
is ignored when **`database_url`** is set. **`flightdeck doctor --backup`** remains
SQLite-only. See [release-artifact.md § Workspace config](release-artifact.md).

---

## `flightdeck doctor`

Run read-only health checks on the local SQLite ledger.
Run read-only health checks on the workspace ledger (SQLite file or PostgreSQL when
**`database_url`** is configured).

```bash
flightdeck doctor [--backup PATH]
```

Calls `Storage.migrate()` at start (idempotent). With **`--backup PATH`**, runs an SQLite
online backup of the workspace database to **`PATH`** (parent directories are created;
an existing file is overwritten), then runs the checks below.
online backup of the workspace database to **`PATH`** when the workspace uses SQLite
(**`--backup`** is rejected for PostgreSQL-ledgers; use **`pg_dump`** instead). Parent
directories are created; an existing file is overwritten, then the checks below run.

Without **`--backup`**, only the checks run. In both cases **`migrate()`** runs first.

Expand Down
4 changes: 3 additions & 1 deletion docs/operations-and-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,10 @@ maps these to `click.ClickException`; the HTTP layer maps them to HTTP 400.
`server/app.py` registers a FastAPI **lifespan** handler that runs at startup:

```python
from flightdeck.storage import storage_from_config

cfg = load_config() # reads flightdeck.yaml from cwd
storage = Storage(cfg.db_path)
storage = storage_from_config(cfg)
storage.migrate()
app.state.cfg = cfg
app.state.storage = storage
Expand Down
11 changes: 8 additions & 3 deletions docs/release-artifact.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,9 @@ look for this file in the current working directory.
```yaml
api_version: v1
kind: WorkspaceConfig
db_path: .flightdeck/flightdeck.db # SQLite database path
db_path: .flightdeck/flightdeck.db # SQLite database path (default when database_url unset)
# Optional: ledger on PostgreSQL (requires psycopg; install flightdeck-ai[postgres])
# database_url: postgresql://user:pass@localhost:5432/flightdeck
default_environment: local # default environment for register/diff/promote
diff:
min_candidate_runs: 500 # HIGH confidence threshold (candidate side)
Expand All @@ -154,8 +156,11 @@ diff:
# promotion_requires_approval: false
```

All fields have defaults; an empty `flightdeck.yaml` is valid. `db_path` accepts any
relative or absolute path — the parent directory is created automatically on first use.
All fields have defaults; an empty `flightdeck.yaml` is valid. **`db_path`** accepts any
relative or absolute SQLite path — the parent directory is created automatically on first use.
When **`database_url`** is a `postgresql://` (or `postgres://`) DSN, the ledger uses that
database instead and **`db_path`** is ignored for storage (keep **`flightdeck doctor --backup`**
on SQLite, or use **`pg_dump`** for Postgres).

**`pricing_catalog_path`** — optional path to a [`PricingCatalog`](../schemas/v1/pricing_catalog.schema.json) YAML
(relative to the workspace cwd or absolute). When set, diffs include additive `pricing.catalog` / `pricing.hints`.
Expand Down
53 changes: 31 additions & 22 deletions docs/web-ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ The app uses **HashRouter** (`react-router-dom`) so all navigation stays within

| Hash path | Component | HTTP calls | Notes |
|-----------|-----------|-----------|-------|
| `#/` | `OverviewPage` | `GET /v1/releases`, `GET /v1/promoted`, `GET /v1/actions`, `GET /v1/metrics` (parallel where applicable) | Ledger metrics (read-only); short per-counter hints; skeleton while loading; links to Diff/Runs |
| `#/` | `OverviewPage` | `GET /v1/releases`, `GET /v1/promoted`, `GET /v1/actions`, `GET /v1/metrics` (parallel where applicable) | Ledger metrics (read-only); short per-counter hints; skeleton on first load; **auto-refresh** every 30s when the tab is visible + on timeline **`generation`** bump; links to Diff/Runs |
| `#/diff` | `DiffPage` | `POST /v1/diff` | Sections: policy gate (incl. `evaluated_at`), evidence window, pricing/catalog/hints (incl. provider/version skew callout when sides differ), per-1k prices when present, cost/quality rollups; raw JSON panel |
| `#/runs` | `RunsPage` | `GET /v1/releases` (for datalist), `GET /v1/runs`, `GET /v1/runs/export` | Forensics: filters, table (trace/status, trace band rows or **Group by trace_id**), **View** drawer, empty/offset/truncation hints, NDJSON download |
| `#/runs` | `RunsPage` | `GET /v1/releases` (for datalist), `GET /v1/runs`, `GET /v1/runs/export` | Forensics: filters, table (trace/status, trace band rows or **Group by trace_id**), **View** drawer (focus trap, session/span ids), typed **run-query error** card with **Retry**, empty/offset/truncation hints, NDJSON download |
| `#/actions` | `ActionsPage` | `GET /v1/workspace`, `GET /v1/promotion-requests` (when `promotion_requires_approval`), `POST /v1/promote` **or** `POST /v1/promote/request` + `POST /v1/promote/confirm`, `POST /v1/rollback` | Workspace skeleton then strip; approval path: numbered steps, pending **Refresh list** / **Use for confirm**; **Rollback** danger-styled; see **ActionsPage** below |
| `#/*` (any other) | — | Redirects to `#/` | |

Expand All @@ -37,23 +37,28 @@ promote/rollback capability should be unavailable regardless of network placemen

```
App (HashRouter)
└── AppShell (layout: header + nav)
└── AppShell (layout: left sidebar + main column)
└── TimelineRefreshProvider (context)
── SecurityStatusBar (below header, above main content)
├── OverviewPage (route: #/)
├── DiffPage (route: #/diff)
├── RunsPage (route: #/runs)
└── ActionsPage (route: #/actions; redirects → #/ when UI_READ_ONLY)
── div.fd-shell
├── aside.fd-sidebar (brand + primary nav)
└── div.fd-shell__content
├── SecurityStatusBar
└── main#main-content → OverviewPage | DiffPage | RunsPage | ActionsPage
```

---

## `AppShell` (`web/src/components/AppShell.tsx`)

Renders the top header with brand name and primary nav links, then an `<Outlet>` for the
active page. Wraps the entire subtree in `TimelineRefreshProvider` so any descendant can
access the refresh context. Mounts `SecurityStatusBar` between the header and the main
content area.
Renders a fixed-width **left sidebar** (`aside.fd-sidebar`) with brand and vertical primary
nav (Langfuse-style rail), then a **`fd-shell__content`** column with `SecurityStatusBar` and
`<main>` wrapping an `<Outlet>` for the active page. On narrow viewports the sidebar stacks
above the content with a horizontal nav row. Wraps the subtree in `TimelineRefreshProvider`
so any descendant can access the refresh context.

A **Skip to main content** link (class `fd-skip-link`) appears first in the shell; it uses
`preventDefault` + `focus()` on `#main-content` so **HashRouter** hash URLs (`#/…`) are not
replaced by a fragment-only `href`.

Nav links use `NavLink` from `react-router-dom` with an `fd-nav__link--active` class applied
when the route is active. The **Promote** nav link is suppressed when `UI_READ_ONLY` is
Expand Down Expand Up @@ -105,7 +110,8 @@ Build-time configuration helpers read from `import.meta.env`:

## `SecurityStatusBar` (`web/src/components/SecurityStatusBar.tsx`)

Mounted by `AppShell` between the header and the main content area. Fetches `GET /health`
Mounted by `AppShell` at the top of the main content column (below the sidebar on wide
layouts). Fetches `GET /health`
on mount to read `mutation_auth` (`"bearer"` or `"loopback"`), then renders an info or
warning strip:

Expand Down Expand Up @@ -141,9 +147,10 @@ Read-only dashboard. Renders a **Ledger metrics** card from `fetchMetrics()` plu
Long IDs are abbreviated with `shortId(id, keepStart, keepEnd)` and shown in full on hover
via the HTML `title` attribute.

**Refresh:** a manual **Refresh** button in the page header calls `loadTimeline()` directly.
The `generation` counter from `TimelineRefreshContext` also triggers automatic refreshes
after mutations from `ActionsPage`.
**Refresh:** while the document tab is visible, the page **auto-polls** metrics and the
timeline on an interval and uses **silent** fetches after the first load. The `generation`
counter from `TimelineRefreshContext` triggers an immediate refresh after mutations from
`ActionsPage`.

---

Expand Down Expand Up @@ -357,8 +364,9 @@ All tokens are CSS custom properties on `:root`:

| Token | Purpose |
|-------|---------|
| `--fd-bg` | Page background |
| `--fd-surface` | Card / header background |
| `--fd-bg` | Main column page background |
| `--fd-surface` | Card / sidebar rail background |
| `--fd-sidebar-width` | Width of the left navigation rail (wide layouts) |
| `--fd-surface-2` | Secondary surface (hover, code blocks) |
| `--fd-border` | Standard border |
| `--fd-border-strong` | Input and button borders |
Expand All @@ -378,9 +386,10 @@ All tokens are CSS custom properties on `:root`:

| Class | Description |
|-------|-------------|
| `fd-shell` | Full-height flex container for header + main |
| `fd-header` | Sticky top bar |
| `fd-nav__link` | Navigation link; `--active` modifier for current route |
| `fd-shell` | Full-height row: sidebar + main column |
| `fd-sidebar` | Left rail: brand block + `fd-sidebar__nav` primary links |
| `fd-shell__content` | Flex column: security strip + `fd-main` |
| `fd-nav__link` | Sidebar nav link; `--active` modifier (accent left border) |
| `fd-main` | Page content area with max-width and padding |
| `fd-page-head` | Flex row with title/subtitle and optional action button |
| `fd-card` | White surface card with border and shadow |
Expand All @@ -393,7 +402,7 @@ All tokens are CSS custom properties on `:root`:
| `fd-field` | Label + input pair; `--full` modifier spans both grid columns |
| `fd-input` | Styled text input |
| `fd-alert` | Inline alert box; `--error`, `--info`, `--warn` modifiers |
| `fd-security-strip` | Full-width strip below the header; wraps `SecurityStatusBar` output |
| `fd-security-strip` | Strip at top of main column; wraps `SecurityStatusBar` output |
| `fd-security-strip__msg` | Message paragraph inside the security strip (zero margin) |
| `fd-json-panel` | Collapsible JSON viewer container |
| `fd-metric-grid` | Grid of metric cards for diff output |
Expand Down
2 changes: 2 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ This folder holds **copy-pasteable** references for wiring FlightDeck into a rea
6. **Run the server** in a container or compose stack — see [deploy/](deploy/README.md). The bundled UI calls **`GET /v1/workspace`** to choose direct promote vs request/confirm.
7. **Triage runs** with **`flightdeck runs list`** / **`runs export`** or **`GET /v1/runs`**, and **observe** aggregate ledger size with **`GET /v1/metrics`** (JSON counters; read-only, same access tier as other `GET /v1/*` routes). With **`flightdeck serve`**, **`/#/runs`** adds optional **Group by trace_id** (collapsible sections) on top of the same API slice.

**UI polish / operator flow:** See [docs/web-ui.md](../docs/web-ui.md) for routing and surfaces. In the bundled app, prefer **Diff** for policy and pricing conclusions, **Runs** for trace-scoped triage, and **Actions** for promote and rollback so operators rarely need raw JSON first.

## Readiness checklist (quick pass)

Use this as a **discoverability** pass for the **[ROADMAP.md](../ROADMAP.md)** success and readiness signals (not a product guarantee):
Expand Down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ dev = [
"pytest>=7.0",
"ruff==0.15.12",
]
postgres = [
"psycopg[binary]>=3.2",
]

[project.urls]
Homepage = "https://github.com/flightdeckdev/flightdeck"
Expand Down
Loading
Loading