Skip to content

viewer/compare: per-run Δ columns in the By-behavior-category table #161

@changliu2

Description

@changliu2

Background

PR #78 (now closed) experimented with adding per-run Δ columns to the By behavior category table on the compare page (viewer/src/routes/suite/[suite_id]/compare/+page.svelte). The URL plumbing and N-way render loops that PR #78 also touched have since landed independently on main; the per-run Δ columns are the only piece that remains genuinely novel.

What this issue tracks

Add Δ columns to the per-behavior-row table in the compare view so a reviewer can see, for each non-baseline run, the rate delta vs the baseline run for each behavior category. Today the table shows the per-run count (e.g. 5/30 flagged) but not the Δ.

Desired UX

Header row gains one Δ column per non-baseline run, labeled Δ <run.display_name> with a tooltip showing the full run id. Each behavior row gains a matching cell rendered as +10.7 pp ▲ / -22.0 pp ▼ colored green/red, tabular-nums, right-aligned.

Reference snippets from PR #78

The closed PR contains a working draft of the implementation against an older compare-page layout (since rewritten on main to use PrimerDropdown):

  • Grid template: minmax(14rem, 1fr) repeat(\, 100px) repeat(\, 88px)
  • Header: Δ {run.display_name} with title={deltaHeaderTitle(run)}
  • Cell: {deltaText(rowDelta)} {deltaArrow(rowDelta)} with deltaClass(rowDelta) for color

The data-side helper deltaByRun: Record<string, Record<string, number>> was also drafted in viewer/src/lib/server/data.ts lines 120-126 of PR #78's branch.

Acceptance

  • Δ columns render correctly for 2-, 3-, and 4-run compare
  • Default baseline is run index 0 (same convention as the existing baselineIdx \)
  • svelte-check clean
  • Light + dark theme both readable

Why this matters

The bank-manager-demo style of compare (4 variants in one suite) benefits visibly from per-run deltas — readers shouldn't have to do mental arithmetic across columns. Useful for any multi-variant eval-fix loop demo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    designenhancementNew feature or requestfollow-upPolish or post-launch improvement

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions