Skip to content

judges: enum-type JudgeVerdict.drift_kind / severity#443

Merged
pedapudi merged 2 commits into
mainfrom
judge-verdict-enums
May 16, 2026
Merged

judges: enum-type JudgeVerdict.drift_kind / severity#443
pedapudi merged 2 commits into
mainfrom
judge-verdict-enums

Conversation

@pedapudi
Copy link
Copy Markdown
Owner

Motivation

JudgeVerdict is the public result type returned by every Judge.evaluate(...). Its two drift-flavoured fields — drift_kind and severity — were declared as bare lowercase str, documented only as "lowercase strings matching DriftKind / DriftSeverity".

DriftKind and DriftSeverity are already public enums (goldfive.DriftKind, goldfive.DriftSeverity). Leaving the verdict fields as untyped strings means:

  • every caller building a verdict hand-writes a magic string ("tool_error", "critical") with no type-checker support and no autocomplete;
  • every caller consuming a verdict has to know the string is "really" an enum value and coerce it themselves;
  • a typo in the string is silently a different (or invalid) drift kind.

This makes the judge API fully enum-typed so no integration has to round-trip through magic strings.

Change

JudgeVerdict.drift_kind / severity now accept the real DriftKind / DriftSeverity enums — the preferred, typed form:

JudgeVerdict(drift_emitted=True, drift_kind=DriftKind.TOOL_ERROR, severity=DriftSeverity.CRITICAL)

The legacy lowercase-string form is still accepted for back-compat. A frozen-safe JudgeVerdict.__post_init__ normalises a string that matches a known enum value to the enum, so a verdict built either way exposes a real enum on verdict.drift_kind / verdict.severity. Because both enums are StrEnum, the normalised value still compares equal to its lowercase string — existing string-equality consumers (and the existing test corpus) see no behavioural change. An empty string (no drift) or an unrecognised custom string is left untouched, so forward-compatible / domain-specific judges are not broken.

  • judges/base.py — field annotations widened to DriftKind | str / DriftSeverity | str; _normalize_drift_kind / _normalize_drift_severity helpers; __post_init__ applies them via object.__setattr__ (the dataclass is frozen); docstring updated.
  • judges/builtins.py — the built-in judge wrappers pass the DriftEvent enums straight through with no str() round-trip.
  • steerer.py — the verdict-consuming path handles either shape. _drift_from_judge_verdict feeds the value directly to DriftKind(...) / DriftSeverity(...) (both accept an enum member or a string); _emit_judgement flattens either shape to the plain-string proto field via a new _enum_or_str_value helper.
  • Tests — extended test_pluggable_judges.py covering both directions: enum-form construction, legacy-string normalisation, the empty-default sentinel, unrecognised-string pass-through, and a steerer evaluate_judges round-trip on an enum-typed drift verdict.

Verification

  • Full suite green: 2297 passed, 127 skipped (127 skipped are pre-existing optional-extra gates).
  • ruff check clean on all changed files.
  • mypy clean on judges/base.py + judges/builtins.py. (steerer.py has one pre-existing getattr typing nit unrelated to this change — confirmed present on main.)

pedapudi added 2 commits May 16, 2026 11:55
JudgeVerdict's two drift-flavoured fields were bare lowercase str
values documented only as "matching DriftKind / DriftSeverity". Every
caller building or consuming a verdict had to hand-write the magic
string and could not lean on the type checker.

Both fields now accept the real DriftKind / DriftSeverity enums (the
preferred, typed form). The legacy lowercase-string form is still
accepted for back-compat: JudgeVerdict.__post_init__ normalises a
string that matches a known enum value to the enum, so a verdict built
either way exposes a real enum on verdict.drift_kind / verdict.severity.
Because both enums are StrEnum, the normalised value still compares
equal to its lowercase string, so existing string-equality consumers
see no behavioural change. An empty string (no drift) or an
unrecognised custom string is left untouched so forward-compatible /
domain-specific judges are not broken.

- base.py: widen the field annotations to DriftKind | str /
  DriftSeverity | str; add _normalize_drift_kind / _normalize_drift_severity
  helpers and a frozen-safe __post_init__ that applies them; update the
  docstring.
- builtins.py: the built-in judge wrappers pass the DriftEvent enums
  straight through to JudgeVerdict with no str() round-trip.
- steerer.py: the verdict-consuming path handles either shape —
  _drift_from_judge_verdict feeds the value (enum or string) directly to
  DriftKind(...) / DriftSeverity(...), and _emit_judgement flattens
  either shape to the proto string field via a new _enum_or_str_value
  helper.
- Tests both ways: enum-form construction, legacy-string normalisation,
  empty-default sentinel, unrecognised-string pass-through, and a
  steerer evaluate_judges round-trip on an enum-typed drift verdict.
Self-review of the enum-typed JudgeVerdict change surfaced two small
improvements; neither alters behaviour.

- base.py: _normalize_drift_kind / _normalize_drift_severity were
  near-identical 15-line copies differing only in the enum class.
  Collapse them into one _normalize_drift_field(value, enum_cls)
  parameterised over a constrained TypeVar (DriftKind | DriftSeverity)
  — single source of truth for the coercion logic, fully typed
  (mypy clean). __post_init__ now calls object.__setattr__
  unconditionally; the prior `is not` guard was a micro-optimisation
  with no behavioural payoff over an idempotent frozen-field write.

- test_pluggable_judges.py: add a steerer-side test pinning the
  unrecognised-drift_kind path — a verdict whose drift_kind is a
  bespoke string (left untouched by normalisation, per the
  forward-compat contract) still emits JudgementEmitted but forwards
  no DriftEvent, since _drift_from_judge_verdict cannot project it
  onto a DriftKind. This was a pre-existing coverage gap the
  enum-typing change makes worth pinning.

ruff + mypy clean; tests/test_pluggable_judges.py 31 passed.
@pedapudi pedapudi merged commit 39dd2bf into main May 16, 2026
2 checks passed
@pedapudi pedapudi deleted the judge-verdict-enums branch May 16, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant