Parent
#13
What to build
The statistics layer, decoupled from collection - it reads persisted results only and never calls a provider. Establishes the single source of truth for every reported number.
- Shared metric helpers (domain accuracy, composite) that every downstream consumer must use - no second computation path may exist.
- Bootstrapped 95% confidence intervals on the headline conjunctive accuracy.
- Per-part marginal accuracies and per-skill / per-bundle breakdowns (which causal sub-skills fail).
- Refusal and invalid reported as separate columns, never folded into accuracy.
stats (console summary) and export (JSON) CLI commands.
Acceptance criteria
Blocked by
Parent
#13
What to build
The statistics layer, decoupled from collection - it reads persisted results only and never calls a provider. Establishes the single source of truth for every reported number.
stats(console summary) andexport(JSON) CLI commands.Acceptance criteria
exportoutput is deterministic from the results filesBlocked by