feat(metrics+stats): MetricRecord contract + bootstrap/Newey-West/Wilson intervals (v0.52.0)#96
Merged
Merged
Conversation
…son intervals (v0.52.0) Director plan Phase A lib substrate (alpha-engine-docs/private/director-implementation-plan-260604.md): - metrics.py: MetricRecord — the System Report Card v2 per-component contract (value + CI + N/floor + target/red-line + trend + 7-state status taxonomy + criticality), shared by producer (evaluator grading) and consumers (console, public site). Pure helpers derive_status (GREEN/WATCH/RED + 4-state N/A, direction-aware), derive_trend_decoration, derive_letter encode the RC v2 status semantics at the chokepoint so all surfaces agree. - quant/stats/intervals.py: bootstrap_ci (seeded/reproducible), newey_west_se (HAC, Bartlett kernel, auto-lag), wilson_score_interval (small-N rates) — the three CI methods MetricRecord.ci_method references. numpy + stdlib, no scipy. - 31 tests (known-value Wilson 50/100, zero-lag NW = iid SE, N/A precedence, lower-is-better direction); full suite 1138 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… not exact 0/1) CI surfaced wilson_score_interval(0,10) returning ci_low=2.78e-17 (legitimate float noise from the [0,1] clamp; true bound is 0). Assert pytest.approx instead of exact equality. Local numpy happened to compute exact 0.0; CI did not. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase A lib substrate for the Director arc (measurement-first foundation). Authoritative spec:
alpha-engine-docs/private/director-implementation-plan-260604.md§6 Phase A.What
alpha_engine_lib/metrics.py—MetricRecord— the System Report Card v2 per-component contract:value+ci_low/high+ci_method+n_samples/n_floor+target/red_line+trend_4w/13w+ 7-statestatustaxonomy (GREEN/WATCH/RED+ the fourN/A-*states that replace the legacy generic "insufficient data") +criticality+ derived letter. Shared chokepoint so the producer (evaluator grading, Option B) and every consumer (console, public site) agree on schema and status semantics.derive_status(direction-aware via target/red-line ordering; encodes RC v2 Principles 2 + 6),derive_trend_decoration(Principle 5 glyphs),derive_letter(Principle 2 projection).alpha_engine_lib/quant/stats/intervals.py—bootstrap_ci(seeded, reproducible percentile CI),newey_west_se(HAC SE, Bartlett kernel, Newey-West auto-lag),wilson_score_interval(small-N binomial rates). The threeci_methodvaluesMetricRecordreferences. numpy + stdlib, no scipy (mirrors the existingquant/statsno-scipy pattern).Extends the existing
quant/stats/package (alongsidemultiple_testing/dsr/information_coefficient) — not a parallel top-level module.LiteralnotStrEnum(house style + 3.9-compat).Tests
31 new (
test_metrics.py,test_quant_stats_intervals.py) — known-value Wilson 50/100 = [0.4038, 0.5962], zero-lag Newey-West = iid SE, N/A precedence, lower-is-better direction (drawdown/ECE). Full suite 1138 passed.Version 0.51.0 → 0.52.0 (additive minor). Leaving open for review/merge.
🤖 Generated with Claude Code