Add provider-specific OPTIONAL fields to usage.jsonl + drop spec-requires framing#5
Merged
riddim-developer-bot[bot] merged 1 commit intoMay 27, 2026
Conversation
Per-issue cost attribution loses too much signal if the bake-to-
usage.jsonl path drops Codex quota readouts, Codex reasoning-output,
and Claude cache-tier splits. The Symphony Coding-Agent Cost
Telemetry Extension spec allows OPTIONAL field additions without
bumping schemaVersion (per its own §6.3), so this PR adds them.
Spec additions (specs/symphony-cost-telemetry-extension/SPEC.md):
§5.2.1 Input-Token Breakdown
inputUncachedTokens, inputCachedReadTokens, inputCacheWriteTokens,
inputCacheWriteEphemeral5mTokens, inputCacheWriteEphemeral1hTokens
(the last two are Anthropic-only; non-Anthropic writers MUST omit)
§5.2.2 Output-Token Breakdown
outputVisibleTokens, outputReasoningTokens
(reasoning is Codex/o-series-only; other providers MUST omit)
§5.2.3 Quota Sample
quota: { planType, windows: [{label, windowMinutes, usedPercent, resetsAt?}] }
Generic shape — Codex uses `primary` (5h) and `secondary` (7d)
labels but any provider with any number of windows fits the shape.
§5.3 Semantics
SHOULD relations for the breakdown sums.
Package implementation:
- transcripts/codex.mjs now emits per-turn quota samples in the
spec shape directly (no lossy reshape later).
- transcript-to-usage.mjs emits every breakdown field the
transcript carries plus the quota object.
- usage-aggregator.mjs prefers breakdown fields when present and
falls back to inputTokens/outputTokens totals when not.
- bin/llm-cost.mjs's quota printer is now generic over windows
(renders every label the provider reports, not hard-coded to
primary/secondary).
Other fixes folded in (the broken "Symphony Telemetry Extension
Spec" framing for the workspace convention was still on main — the
per-issue-workspace requirement is in OpenAI Symphony's parent
SPEC.md §4.1.4, not an extension):
- DEFAULT_CWD_PATTERN broadened to match both the spec-default
`<system-temp>/symphony_workspaces/<ID>` and the common in-repo
`<repo>/.symphony/workspaces/<ID>` workspace.root settings.
- Issue-ID character class widened from [A-Z]+-\d+ (Linear-only)
to [A-Za-z0-9._-]+ (matches the spec's workspace_key
sanitization rule).
- README sections rewritten: usage.jsonl bake is presented as a
built-in feature of the package; spec interop is mentioned only
as an optional side-benefit. No prose claims that any extension
spec is required to use llm-cost.
End-to-end verified on real data (4,309 sessions / 5 GB transcripts):
- Backfill: 190,481 spec-compliant records in a 125 MB file (was
83 MB before the optional-field additions — the extra ~50% is
the breakdown + quota payload). Still 40x smaller than the 5 GB
source.
- Read-back: `llm-cost EPAC-1940 --from-usage <backfilled-file>`
now produces an IDENTICAL output to the transcript-source path,
including the Codex 5h/7d quota readout (58 -> 64% / 56 -> 57%),
the cache-read 51M-token split, and the reasoning-output 18,649
split.
- 0.3s query time vs ~3min for transcript scan, unchanged.
33 tests pass (was 32) — adds Symphony-spec-default cwd test,
adds tests for each new breakdown/quota OPTIONAL field, updates
existing aggregator quota fixtures to the new spec shape.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Previously the backfilled
usage.jsonldropped three high-signal cost fields:The cost-telemetry spec allows OPTIONAL field additions without bumping
schemaVersion(per its own §6.3), so this PR adds them. The bake path is now lossless for every signal the cost analysis actually uses.Spec additions (
specs/symphony-cost-telemetry-extension/SPEC.md)§5.2.1 Input-Token Breakdown
inputUncachedTokens,inputCachedReadTokens,inputCacheWriteTokensinputCacheWriteEphemeral5mTokens,inputCacheWriteEphemeral1hTokens(Anthropic-only; other writers MUST omit)§5.2.2 Output-Token Breakdown
outputVisibleTokens,outputReasoningTokens(reasoning is Codex/o-series-only)§5.2.3 Quota Sample
Generic shape — Codex emits
primary(5h) andsecondary(7d), but any provider with any number of rate-limit windows fits the structure.§5.3 Semantics adds SHOULD-sum relations across the breakdown buckets.
Implementation
transcripts/codex.mjsemits per-turn quota samples in the spec'swindowsshape directly (no lossy reshape).transcript-to-usage.mjsemits every breakdown field the transcript carries plus the quota object.usage-aggregator.mjsprefers the breakdown fields when present; falls back to the REQUIREDinputTokens/outputTokenstotals when not.bin/llm-cost.mjsquota printer is now generic overwindows(renders every label the provider reports, not hard-coded to primary/secondary).Framing fixes folded in
The broken "Symphony Telemetry Extension Spec" framing for the workspace convention was still on main (the per-issue-workspace requirement is in OpenAI Symphony's parent SPEC.md §4.1.4 — there is no extension spec for it). Re-applies the correction that got lost when its prior worktree was torn down:
DEFAULT_CWD_PATTERNbroadened to match both spec-default<system-temp>/symphony_workspaces/<ID>and the common in-repo<repo>/.symphony/workspaces/<ID>.[A-Z]+-\d+(Linear-specific) to[A-Za-z0-9._-]+(matches the spec's workspace_key sanitization rule).README prose changes (root README + package README) — per the user's explicit guidance:
llm-cost-attributionREQUIRES any extension spec.usage.jsonlbake feature is presented as built-in; spec interop with other tools is mentioned only as an optional side-benefit.https://github.com/openai/symphony/blob/main/SPEC.md) is cited correctly for the per-issue-workspace convention it actually requires.End-to-end verification on real data
Re-backfilled the full 4,309 sessions / 5 GB of transcripts on this machine:
usage.jsonlsizellm-cost EPAC-1940 --from-usageoutput matches transcript-source?Sample backfilled Codex record now includes:
{ ... "inputUncachedTokens": 19975, "inputCachedReadTokens": 3456, "outputVisibleTokens": 135, "outputReasoningTokens": 60, "quota": { "planType": "pro", "windows": [ { "label": "primary", "windowMinutes": 300, "usedPercent": 6, "resetsAt": 1778021087 }, { "label": "secondary", "windowMinutes": 10080, "usedPercent": 7, "resetsAt": 1778548110 } ] } }Sample backfilled Claude record with cache writes:
{ ... "inputUncachedTokens": 3, "inputCachedReadTokens": 0, "inputCacheWriteTokens": 45728, "inputCacheWriteEphemeral5mTokens": 45728 }Test plan
node --test. Added: Symphony-spec-default cwd test, breakdown field tests, quota round-trip test. Updated: existing aggregator quota fixtures to the new spec shape.node --checkclean on every .mjs (including the newsrc/quota.mjs)validateUsageRecordllm-cost EPAC-1940(transcripts) andllm-cost EPAC-1940 --from-usage <file>produce identical token totals, turn counts, model lists, wall-clock spans, AND quota readouts