Skip to content

Disaggregated scorecards on the leaderboard (Move 1) #255

@openclaw-dv

Description

@openclaw-dv

Context

Surfaced by synthpanel-driven PMF survey (n=70, 4 persona packs, 2026-05-14). All 4 packs independently rejected a single-number leaderboard. Buyers want disaggregated subgroup scorecards — accuracy broken out by age × geography × education × question-type, with explicit error bands.

Q4 from the survey: 61% said they'd switch synthetic-respondent vendors based on a credible 3rd-party benchmark, but the open-ended evidence makes it conditional on seeing the subgroup breakdown.

What to build

For every benchmarked vendor row on synthbench.org, add an expandable calibration matrix:

Dimension Cells
Age band 18–24, 25–34, 35–49, 50–64, 65+
Geography Census regions (Northeast / Midwest / South / West) or finer if data supports it
Education HS-or-less, Some college, BA, Grad
Question type Numeric, Likert, Multi-choice, Open-ended-summarizable

Each cell: vendor accuracy ± confidence interval, sample size, and a colour gradient.

The headline single-number score stays (people scan), but it's secondary; the subgroup matrix is the lede.

Done when

  • Per-vendor expandable matrix renders on synthbench.org/leaderboard/
  • Backing data API endpoint returns the matrix data
  • At least 2 dimensions are populated for all currently-listed vendors
  • Leading indicator instrumentation: track % of leaderboard sessions that expand into per-vendor view

Leading indicator (success signal)

40% of synthbench.org sessions click into the per-vendor disaggregated view within 30 days of launch.

Out of scope (file as follow-ups if relevant)

  • User-uploaded validation packs (separate move, see leaderboard issue stream)
  • Reproducible run hashes (separate move, vendor-onboarding stream)
  • Held-out validation set (separate move, integrity stream)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestleaderboardLeaderboard UX + data modelpmf-researchSurfaced from synthpanel-driven PMF research 2026-05-14

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions