judges: enum-type JudgeVerdict.drift_kind / severity#443
Merged
Conversation
JudgeVerdict's two drift-flavoured fields were bare lowercase str values documented only as "matching DriftKind / DriftSeverity". Every caller building or consuming a verdict had to hand-write the magic string and could not lean on the type checker. Both fields now accept the real DriftKind / DriftSeverity enums (the preferred, typed form). The legacy lowercase-string form is still accepted for back-compat: JudgeVerdict.__post_init__ normalises a string that matches a known enum value to the enum, so a verdict built either way exposes a real enum on verdict.drift_kind / verdict.severity. Because both enums are StrEnum, the normalised value still compares equal to its lowercase string, so existing string-equality consumers see no behavioural change. An empty string (no drift) or an unrecognised custom string is left untouched so forward-compatible / domain-specific judges are not broken. - base.py: widen the field annotations to DriftKind | str / DriftSeverity | str; add _normalize_drift_kind / _normalize_drift_severity helpers and a frozen-safe __post_init__ that applies them; update the docstring. - builtins.py: the built-in judge wrappers pass the DriftEvent enums straight through to JudgeVerdict with no str() round-trip. - steerer.py: the verdict-consuming path handles either shape — _drift_from_judge_verdict feeds the value (enum or string) directly to DriftKind(...) / DriftSeverity(...), and _emit_judgement flattens either shape to the proto string field via a new _enum_or_str_value helper. - Tests both ways: enum-form construction, legacy-string normalisation, empty-default sentinel, unrecognised-string pass-through, and a steerer evaluate_judges round-trip on an enum-typed drift verdict.
Self-review of the enum-typed JudgeVerdict change surfaced two small improvements; neither alters behaviour. - base.py: _normalize_drift_kind / _normalize_drift_severity were near-identical 15-line copies differing only in the enum class. Collapse them into one _normalize_drift_field(value, enum_cls) parameterised over a constrained TypeVar (DriftKind | DriftSeverity) — single source of truth for the coercion logic, fully typed (mypy clean). __post_init__ now calls object.__setattr__ unconditionally; the prior `is not` guard was a micro-optimisation with no behavioural payoff over an idempotent frozen-field write. - test_pluggable_judges.py: add a steerer-side test pinning the unrecognised-drift_kind path — a verdict whose drift_kind is a bespoke string (left untouched by normalisation, per the forward-compat contract) still emits JudgementEmitted but forwards no DriftEvent, since _drift_from_judge_verdict cannot project it onto a DriftKind. This was a pre-existing coverage gap the enum-typing change makes worth pinning. ruff + mypy clean; tests/test_pluggable_judges.py 31 passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
JudgeVerdictis the public result type returned by everyJudge.evaluate(...). Its two drift-flavoured fields —drift_kindandseverity— were declared as bare lowercasestr, documented only as "lowercase strings matchingDriftKind/DriftSeverity".DriftKindandDriftSeverityare already public enums (goldfive.DriftKind,goldfive.DriftSeverity). Leaving the verdict fields as untyped strings means:"tool_error","critical") with no type-checker support and no autocomplete;This makes the judge API fully enum-typed so no integration has to round-trip through magic strings.
Change
JudgeVerdict.drift_kind/severitynow accept the realDriftKind/DriftSeverityenums — the preferred, typed form:The legacy lowercase-string form is still accepted for back-compat. A frozen-safe
JudgeVerdict.__post_init__normalises a string that matches a known enum value to the enum, so a verdict built either way exposes a real enum onverdict.drift_kind/verdict.severity. Because both enums areStrEnum, the normalised value still compares equal to its lowercase string — existing string-equality consumers (and the existing test corpus) see no behavioural change. An empty string (no drift) or an unrecognised custom string is left untouched, so forward-compatible / domain-specific judges are not broken.judges/base.py— field annotations widened toDriftKind | str/DriftSeverity | str;_normalize_drift_kind/_normalize_drift_severityhelpers;__post_init__applies them viaobject.__setattr__(the dataclass isfrozen); docstring updated.judges/builtins.py— the built-in judge wrappers pass theDriftEventenums straight through with nostr()round-trip.steerer.py— the verdict-consuming path handles either shape._drift_from_judge_verdictfeeds the value directly toDriftKind(...)/DriftSeverity(...)(both accept an enum member or a string);_emit_judgementflattens either shape to the plain-string proto field via a new_enum_or_str_valuehelper.test_pluggable_judges.pycovering both directions: enum-form construction, legacy-string normalisation, the empty-default sentinel, unrecognised-string pass-through, and a steererevaluate_judgesround-trip on an enum-typed drift verdict.Verification
2297 passed, 127 skipped(127 skipped are pre-existing optional-extra gates).ruff checkclean on all changed files.mypyclean onjudges/base.py+judges/builtins.py. (steerer.pyhas one pre-existinggetattrtyping nit unrelated to this change — confirmed present onmain.)