Skip to content

feat: role-aware health baselines (ADR-040)#29

Merged
JLRansom merged 2 commits into
masterfrom
feat/role-aware-health-baselines
Mar 19, 2026
Merged

feat: role-aware health baselines (ADR-040)#29
JLRansom merged 2 commits into
masterfrom
feat/role-aware-health-baselines

Conversation

@JLRansom

Copy link
Copy Markdown
Owner

Summary

  • Adds lib/health-baselines.ts — a dedicated module that computes 30-day median baselines per agent role (medianCompletionRate, medianErrorDensity, medianWeeklyThroughput) with a 5-minute in-memory TTL cache
  • Adds applyRoleBaseline() to lib/health.ts that normalises raw sub-metrics against the role's median before badge thresholds fire — a researcher at 72% completion looks healthy vs a 70% role norm, alarming vs a 92% norm
  • Adds dbGetTaskRunsByRole() to taskRunRepo.ts for efficient single-query role-cohort fetches (vs N per-agent queries)
  • Wires everything behind ROLE_BASELINES_ENABLED=true feature flag in health-cache.ts — off by default for safe A/B rollout and zero-risk deployment

Degradation contract

Every failure mode has a defined fallback (no silent accuracy bugs):

Condition Behaviour
Role cohort < 3 qualifying agents null-metrics baseline → flat thresholds
Agent < 5 runs in rolling window hasEnoughData: false unchanged
Baseline metric null or zero Skip normalisation for that sub-metric
Division by zero guard Raw value preserved
New/unknown agent type cohortSize = 0 → flat thresholds
DB error on baseline fetch Stale cache returned; no cache → null-metrics

Design decision: BaselineNorms in health.ts

applyRoleBaseline lives in lib/health.ts. To avoid a circular import (health-baselines.ts → health.ts → health-baselines.ts), a minimal BaselineNorms interface is defined in health.ts. TypeScript's structural typing means RoleBaseline satisfies it automatically — no explicit extends needed, no coupling.

Feature flag

Set ROLE_BASELINES_ENABLED=true to activate. Recommended activation sequence:

  1. Deploy (flag off) — verify no regressions
  2. Enable on staging — compare badge distributions before/after
  3. Enable on production — monitor badge flip rate per role for 2–3 weeks
  4. Target: 30–40% reduction in noisy "declining" badges for low-volume roles (tester, researcher)

Prerequisites

Test plan

  • __tests__/unit/health-baselines.test.ts — 34 new tests covering medianOf, computeRoleBaseline, getRoleBaseline cache TTL/invalidation, and all applyRoleBaseline normalisation rules and guards
  • __tests__/unit/health-cache.test.ts — 4 new integration tests: flag off → raw metrics, flag on → normalised metrics, unknown agent → graceful degradation, cache hit after normalisation
  • All 201 existing tests pass unchanged (zero regressions)

🤖 Generated with Claude Code

JLRansom and others added 2 commits March 18, 2026 22:13
Add lib/health-baselines.ts — a dedicated module that computes and caches
30-day median baselines (completionRate, errorDensity, weeklyThroughput) per
agent role. A new applyRoleBaseline() in lib/health.ts normalises each
agent's raw sub-metrics relative to its role cohort's median before badge
thresholds are applied, so a tester at 72% completion looks healthy if
testers average 70% but alarming if they average 92%.

Key design decisions:
- BaselineNorms interface lives in health.ts (not health-baselines.ts) to
  avoid a circular import; structural typing lets RoleBaseline satisfy it
- MIN_COHORT_SIZE = 3 guard falls back to flat thresholds for thin cohorts
- BASELINE_TTL_MS = 5 min cache; BASELINE_WINDOW_MS = 30 days look-back
- Feature-flagged via ROLE_BASELINES_ENABLED env var (default off) for
  safe A/B comparison and emergency rollback without a code revert
- Explicit degradation ladder: cohort < 3 → flat thresholds; agent < 5
  runs → skip; baseline metric null or 0 → skip; DB error → stale cache
  or null-metrics; new role → cohort = 0 → flat thresholds

Files changed:
- lib/health-baselines.ts (new)
- lib/health.ts (+applyRoleBaseline, BaselineNorms, MIN_COHORT_SIZE_FOR_BASELINE)
- lib/health-cache.ts (role lookup + feature flag integration)
- lib/db/repositories/taskRunRepo.ts (+dbGetTaskRunsByRole)
- __tests__/unit/health-baselines.test.ts (new, 34 tests)
- __tests__/unit/health-cache.test.ts (extended, +4 integration tests)
- context/decisions.md (ADR-040)
- docs/agent-types.md (new role recalibration requirement documented)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JLRansom JLRansom merged commit 5f03156 into master Mar 19, 2026
2 checks passed
@JLRansom JLRansom deleted the feat/role-aware-health-baselines branch March 19, 2026 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant