Roadmap

FlightDeck helps teams ship AI agents safely with release diffs, runtime evidence, and policy gates: immutable releases, trusted diffs, and policy-gated promotion.

This document is strategy and ordering, not a second changelog. It goes from what is already shipped to what we are building next, why production can still feel standalone, and what stays off the table. Per-version shipping notes live elsewhere (see below).

Reality check: FlightDeck is intentionally local-first (CLI + SQLite + optional flightdeck serve). That keeps trust boundaries explicit; teams still supply integration glue to run it broadly in production.

Version detail: The current shipping line is v1.2.0. For SemVer-by-SemVer behavior and migrations, use RELEASE_NOTES.md and CHANGELOG.md.

What is shipped (capability snapshot)

Release registry and verification: versioned release.yaml artifacts with checksums, flightdeck release verify.
Economic + operational governance: immutable pricing imports, trusted release diff, policy-gated promote and rollback (including optional approval request/confirm when configured).
Audit trail: promotion/rollback history with stable sequencing (audit_seq) and integrity checks via doctor.
Evidence ingestion: runs ingest from JSONL/JSON arrays plus stable POST /v1/events (schemas/v1/); GET /v1/runs, runs list, optional trace_id filter, and runs export (JSONL) for operator forensics.
Local API + UI: flightdeck serve routes and shipped web bundle under src/flightdeck/server/static/; surfaces summarized in Web UI and operator experience below.
SDK and tooling: Python sync/async clients with retries/batching and flightdeck-quickstart-verify.
Bundled default pricing: convenience flightdeck-bundled-YYYY-MM tables from flightdeck init; refreshed on each minor release when upstream public list pricing changes materially, with flightdeck pricing check / diff pricing.warnings guarding silent staleness (operators still pricing import for production truth).
Operator references: CI examples, deploy/Compose guidance, Helm and fleet examples under examples/.

Web UI and operator experience

Strategic UX intent for the bundled React app (routing and components: docs/web-ui.md). This is not a visual design spec; it keeps UI work aligned with evidence, diff trust, and promotion safety—not dashboard sprawl (see AGENTS.md).

Principles

Operator-first: fewer steps to answer “Can I promote?” and “What broke?”; clarity over decoration.
Trust and safety: mutations are obvious; token/read-only posture stays visible.
Evidence over chrome: structured fields and light timelines where APIs are stable; raw JSON as an escape hatch, not the default reading path.
Density, not platforms: guided flows and scannable summaries—no APM-style UI, no charting product.

Shipped surfaces

Surface	Role	Operator outcome (intent)
Overview	Ledger / promotion snapshot, ledger metrics	See promotion posture and ledger health at a glance before opening Diff or Runs.
Diff	Release comparison, pricing / catalog / hints, policy outcome	Decide promote vs blocked with scannable economics and policy, not raw JSON first.
Runs	Forensics filters, listing, export	Narrow to the slice that explains a spike or incident without re-ingesting elsewhere.
Actions / Promote	Direct promote vs approval request/confirm, rollback	Complete an auditable promotion or rollback with clear guardrails.
Shell	Primary nav, security/status strip, optional read-only build	Trust posture (token, read-only) stays visible while navigating.

UX and UI backlog (grouped)

These map to What is next items 1, 2, and 5; ship notes stay in RELEASE_NOTES / CHANGELOG.

Outcome: an engineer can open a single run or trace view and answer “what happened on this request?” without leaving the app — Runs and forensics (web): run or trace detail (drawer or page), clearer empty and error states, optional timeline grouping by trace_id / session, export affordances consistent with server limits.
Outcome: a reviewer spots policy blocks and pricing skew in seconds — Diff comprehension: stronger scannability for policy blocks and pricing/catalog lines; surface version skew and hint copy when the API exposes it.
Outcome: an approver completes request → confirm without ambiguity — Promotion and approval: progressive disclosure for approval vs direct promote, clearer confirmation copy, pending requests table polish.
Outcome: counters on Overview are interpretable, not decorative — Overview and trust: metrics context (what a counter means), light cross-links to Diff/Runs—not a metrics dashboard product.
Outcome: the UI feels fast and accessible on a laptop — Shell and quality bar: loading states, consistent spacing and type rhythm, keyboard focus and labels, layouts that tolerate narrow viewports where cheap.
Outcome: operators see when mutations or tokens apply — Security ergonomics (UI): token/env/mutation visibility, read-only build behavior, cautious affordances for destructive actions.
Outcome: dense operator layouts stay readable without a bespoke design system — Visual system: shared typography scale, spacing rhythm, focus-visible affordances, and narrow-layout breakpoints so the operator surfaces stay legible without a separate design system product.

Explicit UI deferrals

Out of scope for the near-term web app: arbitrary third-party themes or theme marketplaces; embedded arbitrary log viewers; full observability or fleet consoles in the browser; multi-workspace UI (follows conditional Fleet / cross-workspace in What is next). A single built-in dark palette (plus system preference) aligned with operator ergonomics and brand art is not a “custom theme product”—see docs/web-ui.md — Theming and brand alignment for the phased plan vs the marketing composite.

Deferred until APIs or contracts exist (then revisit UX)

Identity for HTTP and UI beyond shared-secret Bearer: today FLIGHTDECK_LOCAL_API_TOKEN / VITE_FLIGHTDECK_LOCAL_API_TOKEN are an operator-chosen static secret for this server’s JSON API (and the bundled UI when configured). OAuth2/OIDC, per-user sessions, API key rotation, and enterprise SSO in front of flightdeck serve are not shipped in core; expect reverse-proxy or gateway patterns until a future design explicitly extends the trust model. Revisit when there is a concrete contract (token issuance, audience, rotation) that fits local-first operation.
Environment / promotion pipeline visualization (for example DEV → STAGING → PROD lanes with per-stage policy state): today the ledger uses a single environment string and CLI/API fields—not a first-class multi-stage graph. Revisit when the server exposes enough structure to render without inventing state.
Dense “evidence-first release card” on Diff (token deltas, tool lists, synthetic safety rows): ship only fields the diff and catalog payloads actually provide; expand the card when optional aggregates or provenance hooks land in the API.
README / social preview hero (marketing composites, category positioning): tracked with docs and release comms, not as a substitute for honest in-app surfaces.

Production readiness gaps (why it can feel standalone)

Gaps between “works locally” and “easy to use across production services.”

Gap	What production-ready usually requires	FlightDeck intent
Event pipeline	Reliable `RunEvent` emission from app/agent runtimes.	Near term: reference integration examples; operator owns final runtime wiring.
CI/GitOps flow	Register → ingest → diff → gate → promote in pipelines.	Near term: maintained CI examples/templates.
Deployment unit	Repeatable `serve` packaging, health checks, process supervision.	Near term: container/compose guidance; still local-first by default.
Identity and access	Strong auth beyond loopback + optional static Bearer (operator-chosen secret for HTTP API).	Mid term: documented proxy/gateway patterns; interactive OAuth/OIDC for the bundled UI is a longer / conditional arc (see Deferred until APIs above).
Storage/availability	Backup/restore, scaling, HA story.	Operator-owned today; improve docs and patterns.
Observability integration	Correlated telemetry export and operational visibility.	Mid term: OTLP-oriented integration paths (not an APM/dashboard product).
Multi-workspace/fleet	Cross-workspace views and policy coordination.	Long term and conditional; one workspace = one ledger today.

What is next (ordered)

Each item ties to the core promise: release integrity, runtime evidence, policy-gated promotion, and auditability (see AGENTS.md).

Outcome: operators pinpoint the run or trace behind a regression or cost jump from the web — Evidence and forensics (web): replay/trace-oriented views and richer export semantics on top of runs list, trace_id, and JSONL export, so operators can reason over evidence without leaving the product surface. UI details: Web UI and operator experience.
Outcome: economic diffs surface version and naming skew before a bad promote — Catalog lifecycle and diff diagnostics: stronger mismatch signals beyond pricing-table row presence (for example version skew hints), strengthening economic governance on diffs. UI details: Web UI and operator experience.
Outcome: a new service reaches register → ingest → diff → gate using maintained examples — Integration glue: maintain app runtime emitters, CI/GitOps examples, and serve deployment recipes so the path from code to gated promotion is copy-pasteable.
Outcome: flightdeck serve in production is boring to operate (health, restarts, backups) — Serve and deployment hardening: clear operator narrative for health checks, supervision, and backup/restore alongside existing Compose/Helm references.
Outcome: teams using Bearer and read-only builds do not foot-gun — Security ergonomics: continue explicit token/env status, mutation guardrails, and optional read-only UI patterns for local and bounded remote use. UI details: Web UI and operator experience.
Outcome: correlated infra telemetry can sit next to ledger evidence without becoming an APM product — OTLP-oriented integration (mid term): documented or thin adapter-style paths for correlated telemetry; not a commitment to an in-product APM.
Outcome (conditional): multi-team governance without breaking one-ledger trust — Fleet / cross-workspace (conditional): broader governance surfaces only after the signals in Horizons and conditions below; default remains one workspace, one ledger.

v1.2.0 ships the Python 3.11+ floor, HTTP access tightening for ingest and read APIs when a local token is set, bundled default pricing on flightdeck init, optional PostgreSQL, runs export / filters, substantial web operator UX, and experimental flightdeck.integrations. Deeper catalog diagnostics and forensics workstreams continue under What is next; ship notes live in RELEASE_NOTES / CHANGELOG.

Horizons and conditions

Near-term committed direction

The ordered list above is the default backlog shape: deepen evidence and diff trust, reduce integration friction, and harden how serve is run—not a pivot to a hosted control plane.

Conditional directions (not committed by default)

Optional hosted or federated control plane for cross-workspace policy and read models.
Fleet-level analytics via export/read-model patterns (without turning the core into a general data warehouse).
Deeper cost attribution and vendor/tool pricing coverage as the evidence model supports it.
Provenance or supply-chain-style attestations only where they directly strengthen release trust boundaries.

When to expand scope (e.g. fleet / platform options)

Repeated external demand for cross-workspace governance.
Clear operator pain that cannot be solved with local-first patterns plus documented integrations.
Confidence that expansion does not break core trust boundaries and contract stability.

Vision (directional only, not backlog)

FlightDeck as a common release attestation reference for AI systems.
Federated policy models across teams/workspaces with auditable inheritance.
Ecosystem adapters that keep FlightDeck as a governance layer, not an agent framework.

Success and readiness signals

Use examples/README.md as a discoverability pass against these signals (not a product guarantee).

Product (PMF wedge):

Teams treat release versioning + checksum verification as the source of truth for promotion decisions.
Cost/latency/error diff output drives at least one real rollout decision (not demo-only usage).
Policy gates actively block at least one unsafe promotion in normal team workflows.
CI templates are adopted externally without local patching.

Productization:

Approval-gated promotion is used in at least one end-to-end production pipeline.
At least two provider pricing sources compare cleanly in one diff workflow.
Teams can stand up and operate flightdeck serve with documented deployment guidance.

Operator experience (web):

Outcome: within one Diff + Actions pass, an operator states promote vs blocked-by-policy without opening raw JSON first.
Outcome: within about two minutes, an engineer isolates the run or trace responsible for a cost or error spike using Runs filters and export—without re-running the CLI for the same slice.

Non-goals

Near-term exclusions match AGENTS.md (no prompt IDE, no agent framework, no gateway-by-default, no compliance-scanner product, no fine-tuning ops roadmap in core, no broad plugin system, no dashboard-heavy product before CLI/local HTTP is deeply proven). Hosted control plane and in-path traffic routing stay opt-in long-term considerations, not default posture.

References

Contracts and trust: RELEASE_NOTES.md, CHANGELOG.md, SECURITY.md
Versioning: VERSIONING.md
Contributors/org workflow: CONTRIBUTING.md
Engineering rules and doctrine: AGENTS.md
Web UI routing and components: docs/web-ui.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Roadmap

What is shipped (capability snapshot)

Web UI and operator experience

Production readiness gaps (why it can feel standalone)

What is next (ordered)

Horizons and conditions

Near-term committed direction

Conditional directions (not committed by default)

When to expand scope (e.g. fleet / platform options)

Vision (directional only, not backlog)

Success and readiness signals

Non-goals

References

Uh oh!

Uh oh!

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Roadmap

What is shipped (capability snapshot)

Web UI and operator experience

Production readiness gaps (why it can feel standalone)

What is next (ordered)

Horizons and conditions

Near-term committed direction

Conditional directions (not committed by default)

When to expand scope (e.g. fleet / platform options)

Vision (directional only, not backlog)

Success and readiness signals

Non-goals

References