Background
PR #78 (now closed) experimented with adding per-run Δ columns to the By behavior category table on the compare page (viewer/src/routes/suite/[suite_id]/compare/+page.svelte). The URL plumbing and N-way render loops that PR #78 also touched have since landed independently on main; the per-run Δ columns are the only piece that remains genuinely novel.
What this issue tracks
Add Δ columns to the per-behavior-row table in the compare view so a reviewer can see, for each non-baseline run, the rate delta vs the baseline run for each behavior category. Today the table shows the per-run count (e.g. 5/30 flagged) but not the Δ.
Desired UX
Header row gains one Δ column per non-baseline run, labeled Δ <run.display_name> with a tooltip showing the full run id. Each behavior row gains a matching cell rendered as +10.7 pp ▲ / -22.0 pp ▼ colored green/red, tabular-nums, right-aligned.
Reference snippets from PR #78
The closed PR contains a working draft of the implementation against an older compare-page layout (since rewritten on main to use PrimerDropdown):
- Grid template:
minmax(14rem, 1fr) repeat(\, 100px) repeat(\, 88px)
- Header:
Δ {run.display_name} with title={deltaHeaderTitle(run)}
- Cell:
{deltaText(rowDelta)} {deltaArrow(rowDelta)} with deltaClass(rowDelta) for color
The data-side helper deltaByRun: Record<string, Record<string, number>> was also drafted in viewer/src/lib/server/data.ts lines 120-126 of PR #78's branch.
Acceptance
- Δ columns render correctly for 2-, 3-, and 4-run compare
- Default baseline is run index 0 (same convention as the existing
baselineIdx \)
svelte-check clean
- Light + dark theme both readable
Why this matters
The bank-manager-demo style of compare (4 variants in one suite) benefits visibly from per-run deltas — readers shouldn't have to do mental arithmetic across columns. Useful for any multi-variant eval-fix loop demo.
Background
PR #78 (now closed) experimented with adding per-run Δ columns to the By behavior category table on the compare page (
viewer/src/routes/suite/[suite_id]/compare/+page.svelte). The URL plumbing and N-way render loops that PR #78 also touched have since landed independently onmain; the per-run Δ columns are the only piece that remains genuinely novel.What this issue tracks
Add Δ columns to the per-behavior-row table in the compare view so a reviewer can see, for each non-baseline run, the rate delta vs the baseline run for each behavior category. Today the table shows the per-run count (e.g.
5/30 flagged) but not the Δ.Desired UX
Header row gains one Δ column per non-baseline run, labeled
Δ <run.display_name>with a tooltip showing the full run id. Each behavior row gains a matching cell rendered as+10.7 pp ▲/-22.0 pp ▼colored green/red, tabular-nums, right-aligned.Reference snippets from PR #78
The closed PR contains a working draft of the implementation against an older compare-page layout (since rewritten on main to use
PrimerDropdown):minmax(14rem, 1fr) repeat(\, 100px) repeat(\, 88px)Δ {run.display_name}withtitle={deltaHeaderTitle(run)}{deltaText(rowDelta)} {deltaArrow(rowDelta)}withdeltaClass(rowDelta)for colorThe data-side helper
deltaByRun: Record<string, Record<string, number>>was also drafted inviewer/src/lib/server/data.tslines 120-126 of PR #78's branch.Acceptance
baselineIdx\)svelte-checkcleanWhy this matters
The bank-manager-demo style of compare (4 variants in one suite) benefits visibly from per-run deltas — readers shouldn't have to do mental arithmetic across columns. Useful for any multi-variant eval-fix loop demo.