From 8a1cbc87100f3e52afcb430b556555e81f617378 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sat, 2 May 2026 10:01:41 +0000 Subject: [PATCH] docs: add CLI reference and RunEvent field table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add docs/cli.md covering every flightdeck command, its flags, arguments, and exit codes — a stable public contract from v1.0.0: - Global flags and exit code table (0 success, 1 error, 2 checksum mismatch) - flightdeck init, doctor, serve with all options - release subgroup: register, list, show, verify, diff, promote, rollback, history - pricing subgroup: import (--replace/--reason semantics) and show - policy subgroup: set and show - runs subgroup: ingest (JSONL and JSON array formats) - End-to-end typical workflow section Extend docs/http-api.md with a RunEvent field reference table covering all top-level fields, metrics sub-fields, usage.model fields, and usage.tools[] entry fields with types, defaults, and descriptions. Update README.md Documentation section to link docs/cli.md and add the RunEvent reference note to the HTTP API entry. Co-authored-by: Gottam Sai Bharath --- README.md | 3 +- docs/cli.md | 472 +++++++++++++++++++++++++++++++++++++++++++++++ docs/http-api.md | 46 +++++ 3 files changed, 520 insertions(+), 1 deletion(-) create mode 100644 docs/cli.md diff --git a/README.md b/README.md index 57fce8f..a1b5f9e 100644 --- a/README.md +++ b/README.md @@ -109,7 +109,8 @@ Substitute them before ingestion, or run **`uv run flightdeck-quickstart-verify` ## Documentation -- [HTTP API reference](docs/http-api.md) — all `/v1/*` routes, request/response shapes, auth +- [CLI reference](docs/cli.md) — all commands, flags, arguments, and exit codes +- [HTTP API reference](docs/http-api.md) — all `/v1/*` routes, request/response shapes, auth, `RunEvent` field reference - [Python SDK](docs/sdk.md) — `FlightdeckClient` / `AsyncFlightdeckClient` usage guide - [Operations and policy](docs/operations-and-policy.md) — diff, promote, rollback internals; policy model and confidence tiers - [Release artifacts and pricing](docs/release-artifact.md) — `release.yaml` format, bundle layout, checksum algorithm, workspace config, pricing tables diff --git a/docs/cli.md b/docs/cli.md new file mode 100644 index 0000000..5ea1619 --- /dev/null +++ b/docs/cli.md @@ -0,0 +1,472 @@ +# FlightDeck CLI Reference + +This document covers every `flightdeck` command, its flags, arguments, and exit codes. +The CLI is a **stable public contract** from v1.0.0 — flags, exit codes, and command +names will not change in patch or minor releases. + +For the on-disk formats the CLI reads and writes see +[release-artifact.md](release-artifact.md). For the HTTP API exposed by `flightdeck +serve` see [http-api.md](http-api.md). + +## Global flags + +| Flag | Description | +|------|-------------| +| `--version` | Print the installed version and exit | +| `--help` | Print help for any command or subcommand | + +All commands require a `flightdeck.yaml` in the working directory (or the default path +`./flightdeck.yaml`). Run `flightdeck init` to create one. + +## Exit codes + +| Code | Meaning | +|------|---------| +| `0` | Success | +| `1` | Error (configuration, operation, policy block, etc.) | +| `2` | Checksum mismatch (`release verify` only) | + +--- + +## `flightdeck init` + +Create a default `flightdeck.yaml` workspace config in the current directory. + +```bash +flightdeck init [--path PATH] +``` + +| Option | Default | Description | +|--------|---------|-------------| +| `--path` | `flightdeck.yaml` | Target path for the config file | + +Fails with exit 1 if the file already exists. + +**Example output:** +``` +Wrote flightdeck.yaml +``` + +The generated file uses all defaults. Edit `diff.*` thresholds or `db_path` before using +in a shared repo. See [release-artifact.md § Workspace config](release-artifact.md). + +--- + +## `flightdeck doctor` + +Run read-only health checks on the local SQLite ledger. + +```bash +flightdeck doctor +``` + +No flags. Calls `Storage.migrate()` at start (idempotent) and then checks: + +| Check | What it verifies | +|-------|-----------------| +| `schema_migrations` | All migration versions through `LATEST_SCHEMA_MIGRATION_VERSION` are applied | +| `promoted_pointer::` | Every `release_id` in `promoted_releases` has a matching row in `releases` | +| `audit_seq` | `release_actions.audit_seq` is contiguous from 1 to max with no NULLs, gaps, or duplicates | + +Output format: +``` +ok schema_migrations: applied=[1, 2, 3] expected 1..3 +ok promoted_pointer:agent_support:production: release_id=rel_abc123 ok +ok audit_seq: contiguous 1..4 (4 row(s)) +Doctor: 3 check(s), all passed. +``` + +Failed checks print to stderr and exit 1. Passing exits 0. + +--- + +## `flightdeck serve` + +Start the local FlightDeck HTTP service. + +```bash +flightdeck serve [--host HOST] [--port PORT] [--reload] +``` + +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `127.0.0.1` | Bind address. Non-loopback addresses print a security warning | +| `--port` | `8765` | Bind port | +| `--reload` | off | Hot-reload on source changes (development only) | + +The server exposes `/v1/*` JSON routes. See [http-api.md](http-api.md) for full route +documentation. + +**Authentication:** set `FLIGHTDECK_LOCAL_API_TOKEN` to require a Bearer token for +mutation routes (`POST /v1/promote`, `POST /v1/rollback`). See +[http-api.md § Authentication](http-api.md). + +```bash +export FLIGHTDECK_LOCAL_API_TOKEN="$(openssl rand -hex 32)" +flightdeck serve +``` + +--- + +## `flightdeck release` + +Subgroup for managing release artifacts. + +### `flightdeck release register` + +Register an immutable release artifact and print the assigned `release_id`. + +```bash +flightdeck release register PATH [--env ENV] +``` + +| Argument / Option | Description | +|-------------------|-------------| +| `PATH` | Path to a `release.yaml` file or a bundle directory containing one | +| `--env` | Override the environment label for this registration (default: `WorkspaceConfig.default_environment`) | + +Prints the new `release_id` (e.g. `rel_abc123def456`) to stdout on success. Use command +substitution to capture it: + +```bash +RELEASE_ID=$(flightdeck release register ./candidate-release --env production) +echo $RELEASE_ID +``` + +The checksum is computed at registration time and stored alongside the artifact JSON. +Use `release verify` later to confirm the on-disk files have not changed. + +### `flightdeck release list` + +List all registered releases. + +```bash +flightdeck release list +``` + +Output is tab-separated: `release_id`, `agent_id`, `version`, `environment`, +`created_at`. Ordered by creation time (newest first). + +### `flightdeck release show` + +Print a registered release record as JSON. + +```bash +flightdeck release show RELEASE_ID +``` + +Outputs the full `ReleaseRecord` including `artifact_json` (the parsed `release.yaml` +content) and the stored checksum. + +### `flightdeck release verify` + +Verify that an on-disk bundle matches the checksum stored at registration. + +```bash +flightdeck release verify RELEASE_ID --path BUNDLE_PATH +``` + +| Argument / Option | Description | +|-------------------|-------------| +| `RELEASE_ID` | The release to verify against | +| `--path` | Bundle directory or single `release.yaml` to recompute checksum from (required) | + +**Exit codes:** +- `0` — checksums match +- `1` — release not found or config error +- `2` — checksum mismatch (files changed since registration) + +```bash +flightdeck release verify rel_abc123 --path ./candidate-release +# OK: checksum matches for rel_abc123 +# sha256=e3b0c44298fc1c149afb... + +# On mismatch: exit code 2 with detail to stderr +``` + +### `flightdeck release diff` + +Compare two registered releases over a time window and print a confidence-labeled +safety diff. + +```bash +flightdeck release diff BASELINE_ID CANDIDATE_ID --window WINDOW [OPTIONS] +``` + +| Argument / Option | Description | +|-------------------|-------------| +| `BASELINE_ID` | Baseline release ID | +| `CANDIDATE_ID` | Candidate release ID | +| `--window` | **Required.** Time window: `7d`, `24h`, `30m`, etc. | +| `--env` | Filter events by environment (default: `WorkspaceConfig.default_environment`) | +| `--tenant` | Filter events by `tenant_id` | +| `--task` | Filter events by `task_id` | + +Both releases must have the same `agent_id`. Cross-agent diffs are rejected with exit 1. + +The diff is a **read-only computation** — it does not write to the audit ledger or update +any promoted pointers. + +**Example output:** +``` +Window: 7d (2026-04-24T12:00:00+00:00 .. 2026-05-01T12:00:00+00:00) +Filters: env=production tenant=* task=* +Baseline pricing: openai/2024-02 (model=gpt-4o) +Candidate pricing: openai/2024-05 (model=gpt-4o) +Samples: baseline=1200 candidate=850 +Confidence: HIGH + +Estimated model token cost/run (USD): 0.002341 -> 0.002189 (delta -0.000152, -6.50%) +Latency avg (ms): 910.50 -> 875.20 (delta -35.30) +Error rate: 0.0083 -> 0.0071 (delta -0.0012) + +Policy: PASS +``` + +When pricing or model changes between baseline and candidate, an additional note is +printed: +``` +NOTE: cost delta includes pricing/model assumption changes (pricing reference and/or model differ). +``` + +See [operations-and-policy.md](operations-and-policy.md) for the cost calculation and +confidence algorithm. + +### `flightdeck release promote` + +Evaluate active policy and promote a release to an environment. + +```bash +flightdeck release promote RELEASE_ID --env ENV --window WINDOW --reason REASON +``` + +| Argument / Option | Description | +|-------------------|-------------| +| `RELEASE_ID` | Release to promote | +| `--env` | **Required.** Target environment | +| `--window` | **Required.** Evidence window used for policy evaluation | +| `--reason` | **Required.** Rationale written to the audit ledger (non-empty) | + +The currently promoted release for `(agent_id, environment)` becomes the baseline for +the diff. If no release is currently promoted, the first promotion skips policy +evaluation and succeeds unconditionally. + +If policy passes, the promoted pointer is updated and the command exits 0. If policy +fails, the **intent is still recorded** in the audit ledger (the `release_actions` row +is written) but the pointer is not updated and the command exits 1. + +``` +Promoted rel_abc123 for agent_support/production +Policy: PASS +``` + +Policy failure output: +``` +Policy: FAIL +- candidate cost_per_run_usd 0.006000 exceeds max 0.005000 +Error: Promotion blocked by policy +``` + +### `flightdeck release rollback` + +Roll back to a prior release. Same contract as `promote` but records `"rollback"` in +the audit ledger. + +```bash +flightdeck release rollback RELEASE_ID --env ENV --window WINDOW --reason REASON +``` + +A promoted release must already exist for the agent/environment. Rolling back when +nothing is promoted exits 1. + +### `flightdeck release history` + +Show the promotion and rollback decision history from the audit ledger. + +```bash +flightdeck release history [--agent AGENT_ID] [--env ENV] +``` + +| Option | Description | +|--------|-------------| +| `--agent` | Filter by `agent_id` | +| `--env` | Filter by environment | + +Output is tab-separated per action: `created_at`, `action`, `PASS/FAIL`, `release_id`, +`baseline=`, `actor=`, `reason=`. Policy failure reasons are indented +below each action line. + +``` +2026-05-01T13:00:00+00:00 promote PASS rel_abc123 baseline=rel_prev789 actor=ci-bot reason=passed staging +``` + +--- + +## `flightdeck pricing` + +Subgroup for managing pricing tables. + +### `flightdeck pricing import` + +Import a pricing table YAML file into local storage. + +```bash +flightdeck pricing import PATH [--replace] [--reason REASON] +``` + +| Argument / Option | Description | +|-------------------|-------------| +| `PATH` | Path to a pricing table YAML file | +| `--replace` | Replace an existing `(provider, pricing_version)` table. Requires `--reason` | +| `--reason` | Rationale for the import or replacement. Required when using `--replace` | + +Without `--replace`, importing when a table for that `(provider, pricing_version)` already +exists exits 1. Every import writes an audit record to `pricing_import_audit`. + +```bash +# First import +flightdeck pricing import pricing-openai-2024-05.yaml + +# Update an existing table (audit-sensitive) +flightdeck pricing import pricing-openai-2024-05.yaml --replace --reason "corrected cached rate" +``` + +Pricing table YAML format: see [release-artifact.md § Pricing table YAML format](release-artifact.md). + +### `flightdeck pricing show` + +Show a previously imported pricing table as JSON. + +```bash +flightdeck pricing show --provider PROVIDER --version VERSION +``` + +Both flags are required. If the table does not exist, exits 1 with an error message. + +--- + +## `flightdeck policy` + +Subgroup for managing the active promotion policy. + +### `flightdeck policy set` + +Load a policy YAML file and set it as the active policy. + +```bash +flightdeck policy set PATH +``` + +The policy is validated against the `Policy` Pydantic model and stored in the +`active_policy` SQLite table. Only one policy is active at a time; setting a new one +replaces the previous. + +Example `policy.yaml`: + +```yaml +policy_id: prod-v1 +max_cost_per_run_usd: 0.005 +max_error_rate: 0.02 +max_latency_ms: 2000 +require_high_diff_confidence: true +min_candidate_runs: 200 +min_baseline_runs: 200 +min_low_runs: 20 +``` + +JSON Schema: [`schemas/v1/policy.schema.json`](../schemas/v1/policy.schema.json). + +### `flightdeck policy show` + +Print the active policy as JSON. + +```bash +flightdeck policy show +``` + +If no policy has been set, prints the default policy (all constraints `null`/disabled, +`require_high_diff_confidence: true`). + +--- + +## `flightdeck runs` + +Subgroup for ingesting run events. + +### `flightdeck runs ingest` + +Ingest `RunEvent` records from a JSONL or JSON array file. + +```bash +flightdeck runs ingest PATH +``` + +| Argument | Description | +|----------|-------------| +| `PATH` | Path to a `.jsonl` file (one JSON object per line) or a `.json` file containing a JSON array of events | + +Events with a duplicate `run_id` are silently skipped (idempotent). The command prints +the number of newly inserted events: + +``` +Inserted 47 events +``` + +**Input formats:** + +JSONL (one event per line): +```jsonl +{"api_version":"v1","type":"run_end","timestamp":"2026-05-01T12:00:00Z","agent_id":"agent_support","release_id":"rel_abc123",...} +{"api_version":"v1","type":"run_end","timestamp":"2026-05-01T12:01:00Z","agent_id":"agent_support","release_id":"rel_abc123",...} +``` + +JSON array: +```json +[ + {"api_version":"v1","type":"run_end","timestamp":"2026-05-01T12:00:00Z",...}, + {"api_version":"v1","type":"run_end","timestamp":"2026-05-01T12:01:00Z",...} +] +``` + +See [http-api.md § POST /v1/events](http-api.md) for the full `RunEvent` field reference. + +--- + +## Typical workflow + +```bash +# 1. Initialize workspace +flightdeck init + +# 2. Import pricing tables +flightdeck pricing import pricing-openai-2024-05.yaml + +# 3. Set promotion policy +flightdeck policy set policy.yaml + +# 4. Register releases +BASELINE=$(flightdeck release register ./baseline-release) +CANDIDATE=$(flightdeck release register ./candidate-release) + +# 5. Ingest run evidence (substitute placeholder release IDs first) +sed "s/__BASELINE_RELEASE_ID__/${BASELINE}/g" baseline-events.jsonl > /tmp/baseline-ev.jsonl +sed "s/__CANDIDATE_RELEASE_ID__/${CANDIDATE}/g" candidate-events.jsonl > /tmp/candidate-ev.jsonl +flightdeck runs ingest /tmp/baseline-ev.jsonl +flightdeck runs ingest /tmp/candidate-ev.jsonl + +# 6. Compare releases +flightdeck release diff "$BASELINE" "$CANDIDATE" --window 7d + +# 7. Promote if safe +flightdeck release promote "$BASELINE" --env production --window 7d --reason "initial baseline" + +# 8. View history +flightdeck release history --agent agent_support --env production + +# 9. Health check +flightdeck doctor +``` + +The `flightdeck-quickstart-verify` command (or `python -m flightdeck.quickstart_smoke`) +runs this entire workflow end-to-end using the bundled example fixtures in +`examples/quickstart/`. diff --git a/docs/http-api.md b/docs/http-api.md index c63fb94..1e0b235 100644 --- a/docs/http-api.md +++ b/docs/http-api.md @@ -185,6 +185,52 @@ Ingest `RunEvent` records (runtime evidence for diff and policy evaluation). Full field reference: [`schemas/v1/run_event.schema.json`](../schemas/v1/run_event.schema.json). +### `RunEvent` field reference + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `api_version` | `"v1"` | no (defaults to `"v1"`) | Must be `"v1"` or omitted. Any other value returns HTTP 400. | +| `type` | `"run_start"` \| `"run_end"` | no (defaults to `"run_end"`) | Event type. Only `"run_end"` events carry cost/latency data; `"run_start"` is accepted but contributes no usage. | +| `timestamp` | ISO-8601 string | **yes** | Event timestamp. Used for time-window filtering in diff queries. | +| `workspace_id` | string | no (defaults to `"ws_local"`) | Workspace identifier for multi-workspace setups. | +| `agent_id` | string | **yes** | Stable agent identifier (must match the `spec.agent.agent_id` in the registered release). | +| `release_id` | string | **yes** | The `release_id` returned by `flightdeck release register`. Links the event to a release record. | +| `run_id` | string | **yes** | Unique run identifier per workspace. Duplicate `run_id` values are silently skipped. | +| `tenant_id` | string | **yes** | Tenant identifier. Used as a filter dimension in diff queries (`--tenant`). | +| `task_id` | string | **yes** | Task type identifier. Used as a filter dimension in diff queries (`--task`). | +| `environment` | string | **yes** | Deployment environment (e.g. `"production"`, `"staging"`). Must match the environment used in promote/rollback. | +| `metrics` | object | no | Run-level performance metrics (see below). | +| `usage` | object | **yes** | Model token usage (see below). | +| `labels` | object | no | Arbitrary string key-value pairs for tagging. Not used in diff/policy computations. | +| `request` | object | no | Tracing identifiers (`session_id`, `trace_id`, `span_id`). Not used in diff/policy computations. | + +**`metrics` fields:** + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `success` | boolean | `true` | Whether the run succeeded. `false` events contribute to the error rate in diff computations. | +| `latency_ms` | integer ≥ 0 | `null` | End-to-end run latency in milliseconds. Omit if not measured; `null` events are excluded from latency averages. | +| `error_type` | string | `null` | Optional error class (e.g. `"timeout"`, `"rate_limit"`). Informational only. | + +**`usage` fields:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `usage.model.provider` | string | **yes** | LLM provider (e.g. `"openai"`). Must match the model's pricing table provider. | +| `usage.model.model` | string | **yes** | Model name (e.g. `"gpt-4o"`). Must have an entry in the release's pricing table. | +| `usage.model.input_tokens` | integer ≥ 0 | **yes** | Prompt tokens consumed. | +| `usage.model.output_tokens` | integer ≥ 0 | **yes** | Completion tokens generated. | +| `usage.model.cached_input_tokens` | integer ≥ 0 | no (default `0`) | Cached prompt tokens (used for cached rate pricing when a `cached_input_usd_per_1k_tokens` rate is set). | +| `usage.tools` | array | no | Per-tool usage entries (see below). Currently recorded but not factored into cost computations. | + +**`usage.tools[]` entry fields:** + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `tool_name` | string | **yes** | Tool identifier. | +| `invocations` | integer ≥ 0 | `0` | Number of times the tool was called. | +| `cost_units` | float ≥ 0 | `0.0` | Tool-specific cost units (vendor-defined; not yet used in cost computations). | + --- ## `POST /v1/diff`