Disaggregated scorecards on the leaderboard (Move 1)

## Context

Surfaced by synthpanel-driven PMF survey (n=70, 4 persona packs, 2026-05-14). All 4 packs independently rejected a single-number leaderboard. Buyers want **disaggregated subgroup scorecards** — accuracy broken out by age × geography × education × question-type, with explicit error bands.

Q4 from the survey: 61% said they'd switch synthetic-respondent vendors based on a credible 3rd-party benchmark, but the open-ended evidence makes it conditional on *seeing the subgroup breakdown*.

## What to build

For every benchmarked vendor row on synthbench.org, add an expandable calibration matrix:

| Dimension | Cells |
|---|---|
| Age band | 18–24, 25–34, 35–49, 50–64, 65+ |
| Geography | Census regions (Northeast / Midwest / South / West) or finer if data supports it |
| Education | HS-or-less, Some college, BA, Grad |
| Question type | Numeric, Likert, Multi-choice, Open-ended-summarizable |

Each cell: vendor accuracy ± confidence interval, sample size, and a colour gradient.

The headline single-number score stays (people scan), but it's secondary; the subgroup matrix is the lede.

## Done when

- [ ] Per-vendor expandable matrix renders on synthbench.org/leaderboard/<vendor>
- [ ] Backing data API endpoint returns the matrix data
- [ ] At least 2 dimensions are populated for all currently-listed vendors
- [ ] Leading indicator instrumentation: track % of leaderboard sessions that expand into per-vendor view

## Leading indicator (success signal)

>40% of synthbench.org sessions click into the per-vendor disaggregated view within 30 days of launch.

## Out of scope (file as follow-ups if relevant)

- User-uploaded validation packs (separate move, see leaderboard issue stream)
- Reproducible run hashes (separate move, vendor-onboarding stream)
- Held-out validation set (separate move, integrity stream)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disaggregated scorecards on the leaderboard (Move 1) #255

Context

What to build

Done when

Leading indicator (success signal)

Out of scope (file as follow-ups if relevant)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dimension	Cells
Age band	18–24, 25–34, 35–49, 50–64, 65+
Geography	Census regions (Northeast / Midwest / South / West) or finer if data supports it
Education	HS-or-less, Some college, BA, Grad
Question type	Numeric, Likert, Multi-choice, Open-ended-summarizable

Disaggregated scorecards on the leaderboard (Move 1) #255

Description

Context

What to build

Done when

Leading indicator (success signal)

Out of scope (file as follow-ups if relevant)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions