Context
Surfaced by synthpanel-driven PMF survey (n=70, 4 persona packs, 2026-05-14). All 4 packs independently rejected a single-number leaderboard. Buyers want disaggregated subgroup scorecards — accuracy broken out by age × geography × education × question-type, with explicit error bands.
Q4 from the survey: 61% said they'd switch synthetic-respondent vendors based on a credible 3rd-party benchmark, but the open-ended evidence makes it conditional on seeing the subgroup breakdown.
What to build
For every benchmarked vendor row on synthbench.org, add an expandable calibration matrix:
| Dimension |
Cells |
| Age band |
18–24, 25–34, 35–49, 50–64, 65+ |
| Geography |
Census regions (Northeast / Midwest / South / West) or finer if data supports it |
| Education |
HS-or-less, Some college, BA, Grad |
| Question type |
Numeric, Likert, Multi-choice, Open-ended-summarizable |
Each cell: vendor accuracy ± confidence interval, sample size, and a colour gradient.
The headline single-number score stays (people scan), but it's secondary; the subgroup matrix is the lede.
Done when
Leading indicator (success signal)
40% of synthbench.org sessions click into the per-vendor disaggregated view within 30 days of launch.
Out of scope (file as follow-ups if relevant)
- User-uploaded validation packs (separate move, see leaderboard issue stream)
- Reproducible run hashes (separate move, vendor-onboarding stream)
- Held-out validation set (separate move, integrity stream)
Context
Surfaced by synthpanel-driven PMF survey (n=70, 4 persona packs, 2026-05-14). All 4 packs independently rejected a single-number leaderboard. Buyers want disaggregated subgroup scorecards — accuracy broken out by age × geography × education × question-type, with explicit error bands.
Q4 from the survey: 61% said they'd switch synthetic-respondent vendors based on a credible 3rd-party benchmark, but the open-ended evidence makes it conditional on seeing the subgroup breakdown.
What to build
For every benchmarked vendor row on synthbench.org, add an expandable calibration matrix:
Each cell: vendor accuracy ± confidence interval, sample size, and a colour gradient.
The headline single-number score stays (people scan), but it's secondary; the subgroup matrix is the lede.
Done when
Leading indicator (success signal)
Out of scope (file as follow-ups if relevant)