feat: role-aware health baselines (ADR-040) by JLRansom · Pull Request #29 · JLRansom/openwaypoint

JLRansom · 2026-03-19T03:13:37Z

Summary

Adds lib/health-baselines.ts — a dedicated module that computes 30-day median baselines per agent role (medianCompletionRate, medianErrorDensity, medianWeeklyThroughput) with a 5-minute in-memory TTL cache
Adds applyRoleBaseline() to lib/health.ts that normalises raw sub-metrics against the role's median before badge thresholds fire — a researcher at 72% completion looks healthy vs a 70% role norm, alarming vs a 92% norm
Adds dbGetTaskRunsByRole() to taskRunRepo.ts for efficient single-query role-cohort fetches (vs N per-agent queries)
Wires everything behind ROLE_BASELINES_ENABLED=true feature flag in health-cache.ts — off by default for safe A/B rollout and zero-risk deployment

Degradation contract

Every failure mode has a defined fallback (no silent accuracy bugs):

Condition	Behaviour
Role cohort < 3 qualifying agents	`null`-metrics baseline → flat thresholds
Agent < 5 runs in rolling window	`hasEnoughData: false` unchanged
Baseline metric null or zero	Skip normalisation for that sub-metric
Division by zero guard	Raw value preserved
New/unknown agent type	cohortSize = 0 → flat thresholds
DB error on baseline fetch	Stale cache returned; no cache → null-metrics

Design decision: `BaselineNorms` in `health.ts`

applyRoleBaseline lives in lib/health.ts. To avoid a circular import (health-baselines.ts → health.ts → health-baselines.ts), a minimal BaselineNorms interface is defined in health.ts. TypeScript's structural typing means RoleBaseline satisfies it automatically — no explicit extends needed, no coupling.

Feature flag

Set ROLE_BASELINES_ENABLED=true to activate. Recommended activation sequence:

Deploy (flag off) — verify no regressions
Enable on staging — compare badge distributions before/after
Enable on production — monitor badge flip rate per role for 2–3 weeks
Target: 30–40% reduction in noisy "declining" badges for low-volume roles (tester, researcher)

Prerequisites

PR fix: explicit status classification in analyticsRepo #26 (fix/analytics-status-classification) merged
PR feat: agent health scores — per-agent sub-metric badges #27 (feat/agent-health-metrics) merged
2–3 weeks of production run data collected (≥3 agents per role with ≥5 runs/week)

Test plan

__tests__/unit/health-baselines.test.ts — 34 new tests covering medianOf, computeRoleBaseline, getRoleBaseline cache TTL/invalidation, and all applyRoleBaseline normalisation rules and guards
__tests__/unit/health-cache.test.ts — 4 new integration tests: flag off → raw metrics, flag on → normalised metrics, unknown agent → graceful degradation, cache hit after normalisation
All 201 existing tests pass unchanged (zero regressions)

🤖 Generated with Claude Code

Add lib/health-baselines.ts — a dedicated module that computes and caches 30-day median baselines (completionRate, errorDensity, weeklyThroughput) per agent role. A new applyRoleBaseline() in lib/health.ts normalises each agent's raw sub-metrics relative to its role cohort's median before badge thresholds are applied, so a tester at 72% completion looks healthy if testers average 70% but alarming if they average 92%. Key design decisions: - BaselineNorms interface lives in health.ts (not health-baselines.ts) to avoid a circular import; structural typing lets RoleBaseline satisfy it - MIN_COHORT_SIZE = 3 guard falls back to flat thresholds for thin cohorts - BASELINE_TTL_MS = 5 min cache; BASELINE_WINDOW_MS = 30 days look-back - Feature-flagged via ROLE_BASELINES_ENABLED env var (default off) for safe A/B comparison and emergency rollback without a code revert - Explicit degradation ladder: cohort < 3 → flat thresholds; agent < 5 runs → skip; baseline metric null or 0 → skip; DB error → stale cache or null-metrics; new role → cohort = 0 → flat thresholds Files changed: - lib/health-baselines.ts (new) - lib/health.ts (+applyRoleBaseline, BaselineNorms, MIN_COHORT_SIZE_FOR_BASELINE) - lib/health-cache.ts (role lookup + feature flag integration) - lib/db/repositories/taskRunRepo.ts (+dbGetTaskRunsByRole) - __tests__/unit/health-baselines.test.ts (new, 34 tests) - __tests__/unit/health-cache.test.ts (extended, +4 integration tests) - context/decisions.md (ADR-040) - docs/agent-types.md (new role recalibration requirement documented) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

JLRansom and others added 2 commits March 18, 2026 22:13

chore: update current-sprint.md — add PR #29 to active worktrees

8fce004

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

JLRansom merged commit 5f03156 into master Mar 19, 2026
2 checks passed

JLRansom deleted the feat/role-aware-health-baselines branch March 19, 2026 03:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: role-aware health baselines (ADR-040)#29

feat: role-aware health baselines (ADR-040)#29
JLRansom merged 2 commits into
masterfrom
feat/role-aware-health-baselines

JLRansom commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JLRansom commented Mar 19, 2026

Summary

Degradation contract

Design decision: BaselineNorms in health.ts

Feature flag

Prerequisites

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Design decision: `BaselineNorms` in `health.ts`