Skip to content

feat(metrics+stats): MetricRecord contract + bootstrap/Newey-West/Wilson intervals (v0.52.0)#96

Merged
cipher813 merged 2 commits into
mainfrom
feat/metric-record-and-stat-intervals
Jun 4, 2026
Merged

feat(metrics+stats): MetricRecord contract + bootstrap/Newey-West/Wilson intervals (v0.52.0)#96
cipher813 merged 2 commits into
mainfrom
feat/metric-record-and-stat-intervals

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Phase A lib substrate for the Director arc (measurement-first foundation). Authoritative spec: alpha-engine-docs/private/director-implementation-plan-260604.md §6 Phase A.

What

  • alpha_engine_lib/metrics.pyMetricRecord — the System Report Card v2 per-component contract: value + ci_low/high + ci_method + n_samples/n_floor + target/red_line + trend_4w/13w + 7-state status taxonomy (GREEN/WATCH/RED + the four N/A-* states that replace the legacy generic "insufficient data") + criticality + derived letter. Shared chokepoint so the producer (evaluator grading, Option B) and every consumer (console, public site) agree on schema and status semantics.
    • Pure helpers: derive_status (direction-aware via target/red-line ordering; encodes RC v2 Principles 2 + 6), derive_trend_decoration (Principle 5 glyphs), derive_letter (Principle 2 projection).
  • alpha_engine_lib/quant/stats/intervals.pybootstrap_ci (seeded, reproducible percentile CI), newey_west_se (HAC SE, Bartlett kernel, Newey-West auto-lag), wilson_score_interval (small-N binomial rates). The three ci_method values MetricRecord references. numpy + stdlib, no scipy (mirrors the existing quant/stats no-scipy pattern).

Extends the existing quant/stats/ package (alongside multiple_testing/dsr/information_coefficient) — not a parallel top-level module. Literal not StrEnum (house style + 3.9-compat).

Tests

31 new (test_metrics.py, test_quant_stats_intervals.py) — known-value Wilson 50/100 = [0.4038, 0.5962], zero-lag Newey-West = iid SE, N/A precedence, lower-is-better direction (drawdown/ECE). Full suite 1138 passed.

Version 0.51.0 → 0.52.0 (additive minor). Leaving open for review/merge.

🤖 Generated with Claude Code

cipher813 and others added 2 commits June 4, 2026 09:23
…son intervals (v0.52.0)

Director plan Phase A lib substrate (alpha-engine-docs/private/director-implementation-plan-260604.md):

- metrics.py: MetricRecord — the System Report Card v2 per-component contract
  (value + CI + N/floor + target/red-line + trend + 7-state status taxonomy +
  criticality), shared by producer (evaluator grading) and consumers (console,
  public site). Pure helpers derive_status (GREEN/WATCH/RED + 4-state N/A,
  direction-aware), derive_trend_decoration, derive_letter encode the RC v2
  status semantics at the chokepoint so all surfaces agree.
- quant/stats/intervals.py: bootstrap_ci (seeded/reproducible), newey_west_se
  (HAC, Bartlett kernel, auto-lag), wilson_score_interval (small-N rates) — the
  three CI methods MetricRecord.ci_method references. numpy + stdlib, no scipy.
- 31 tests (known-value Wilson 50/100, zero-lag NW = iid SE, N/A precedence,
  lower-is-better direction); full suite 1138 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… not exact 0/1)

CI surfaced wilson_score_interval(0,10) returning ci_low=2.78e-17 (legitimate
float noise from the [0,1] clamp; true bound is 0). Assert pytest.approx instead
of exact equality. Local numpy happened to compute exact 0.0; CI did not.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit a18663e into main Jun 4, 2026
6 checks passed
@cipher813 cipher813 deleted the feat/metric-record-and-stat-intervals branch June 4, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant