Draft Symphony Coding-Agent Cost Telemetry Extension spec#3
Merged
Conversation
Draft spec extending the OpenAI Symphony specification with a persistent, vendor-neutral on-disk format for coding-agent cost telemetry. Defines a single stream — usage.jsonl — as the projection of coding-agent activity that keeps the cost signal and discards conversation noise. Key design choices: - One stream only (usage.jsonl); other Symphony observability is explicitly out of scope and left for a separate extension. - Vendor-neutral schema decoupled from coding-agent transcript formats, so one cost reader works across Claude, Codex, Gemini, etc. - Storage conventions cover canonical location + JSONL encoding only; writer behavior (rotation, atomicity, retention, compression) is explicitly operator-defined. - Schema versioning rule: bump on breaking change, not on additive optional fields; readers tolerate unknown fields and enum values. - 13 required fields + 13 optional fields, every one with a documented cost-attribution use. 455 lines (about 21% of the parent spec's 2,169 lines), proportional to the narrower scope.
5 tasks
github-actions Bot
pushed a commit
that referenced
this pull request
May 27, 2026
## Summary Implements the consumer side of [the Symphony Coding-Agent Cost Telemetry Extension spec](https://github.com/RiddimSoftware/groove/blob/main/specs/symphony-cost-telemetry-extension/SPEC.md) (PR #3) inside the `llm-cost-attribution` package (PR #2). Lets users delete their transcripts after baking the cost-relevant projection into a much smaller `usage.jsonl` file. ```bash llm-cost backfill --out ~/llm-cost-history.jsonl # bake transcripts → spec-compliant usage.jsonl llm-cost EPAC-1940 --from-usage ~/llm-cost-history.jsonl # query the bake instead of transcripts rm -rf ~/.claude/projects ~/.codex/sessions # safe — cost data is in the bake now ``` ## Real-world numbers (this machine, 4,309 sessions) | | Before backfill | After backfill | |---|---:|---:| | Disk footprint | 5.0 GB | **83 MB** (60× smaller) | | `llm-cost EPAC-1940` query time | ~3 min (full Codex scan) | **~0.3 s** (~600× faster) | | EPAC-1940 token total | 52,605,306 | **52,605,306** ✓ | | EPAC-1940 turn count | 341 | **341** ✓ | | EPAC-1940 wall clock | 1h 53m 40s | **1h 53m 40s** ✓ | Backfill emitted 190,481 spec-compliant records from 4,309 sessions; 1,841 sessions were correctly skipped (ad-hoc CLI work outside any Symphony workspace). ## New library API ```js import { computeIssueCostFromUsage, backfillUsageFromTranscripts, readUsageRecords, appendUsageRecords, validateUsageRecord, sessionToUsageRecords, rollupUsageRecords, SCHEMA_VERSION, } from 'llm-cost-attribution'; ``` ## New CLI surface ``` llm-cost backfill --out <path> # transcripts → spec-compliant usage.jsonl llm-cost <ISSUE> --from-usage <path> # read from usage.jsonl/dir instead of transcripts ``` `--from-usage` accepts either a single `.jsonl` file or a directory of `usage*.jsonl` files (per spec §4.1's "writers MAY split, readers MUST concatenate" rule). ## Fidelity tradeoffs (called out in README) The spec deliberately drops three things from the raw transcripts. After backfill you lose: - Claude cache-tier split (5m vs 1h cache creation tokens) — collapsed into the input total - Codex reasoning-vs-visible output split — collapsed into the output total - Codex `rate_limits.{primary,secondary}.used_percent` quota samples — not in the spec schema Grand totals, per-turn ordinals, models, timestamps, runIDs, and `workspacePath` provenance are preserved exactly. ## Spec conformance (§5.1 Required fields) Every backfilled record carries: `schemaVersion`, `recordedAt`, `runID` (UUID; the CLI session ID), `turn` (1-based monotonic), `issueIdentifier`, `provider`, `model`, `botRole` (always `developer` — spec §5.1 says "Implementations that do not distinguish a reviewer role MUST emit `developer`"), `inputTokens`, `outputTokens`, `totalTokens`, `usageSource: "provider_reported"`, `startedAt`, `endedAt`. Plus the optional `workspacePath` since we already have it. ## Test plan - [x] All 27 package tests pass (`node --test packages/llm-cost-attribution/test/*.test.mjs`) — 11 existing + 8 new in `usage-jsonl.test.mjs` + 5 new in `transcript-to-usage.test.mjs` - [x] Every backfilled record produced from real transcripts passes `validateUsageRecord` - [x] `llm-cost EPAC-1940` and `llm-cost EPAC-1940 --from-usage <backfill>` produce identical token totals, turn counts, and wall-clock spans - [x] `node --check` clean on every .mjs - [x] CI workflow updated to run the new test files Co-authored-by: Sunny Purewal <sunny@riddimsoftware.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft v0.1 of an open-source extension to the OpenAI Symphony specification. Defines `usage.jsonl` — a persistent, vendor-neutral on-disk format for coding-agent cost telemetry — and nothing else.
Lands at `specs/symphony-cost-telemetry-extension/SPEC.md` following groove's existing `specs/` layout convention.
Framing
Provider transcripts (Claude session JSONL, Codex rollouts, etc.) are roughly 1,650× larger than `usage.jsonl` in our aggregate data and contain mostly conversation content that is irrelevant to cost. `usage.jsonl` is the projection of coding-agent activity that keeps the cost signal and discards the conversation noise — one row per turn, ~1 KB per row, vendor-neutral, joinable to the issue tracker, ships anywhere.
The spec scopes itself tightly:
This is deliberate. Operational concerns differ by deployment; defining the data is the spec's job, defining how to write it safely is the implementation's.
Field summary
Required (13): `schemaVersion`, `recordedAt`, `runID`, `turn`, `issueIdentifier`, `provider`, `model`, `botRole`, `inputTokens`/`outputTokens`/`totalTokens` (with null semantics), `usageSource`, `startedAt`, `endedAt`.
Optional (13): `issueID`, `pullRequest` (with `headSHA`), `mode`, `effort`, `exitReason`, `promptBytes`, `estimatedTokenMethod`, `estimatedPromptInputTokens`, `estimate`, `workspacePath`, `reviewerMode`, `experimentAssignment`, `configuredWeight`/`effectiveWeight`, `cooldownReason`.
Size
455 lines, about 21% of the parent spec's 2,169 — proportional to the narrower scope.
History
This spec briefly landed in RiddimSoftware/software-factory#61 before being re-routed here. A companion PR removes it from software-factory to avoid a forked copy.
Test plan