feat(ruleset_strategy): add round_score "rank" (per-episode placement scoring)#48
Merged
Merged
Conversation
… scoring) The ruleset_strategy commissioner scored every round by the mean of each policy's per-episode scores. Add an opt-in `scoring.round_score: "rank"` mode: within each episode policies are ranked by score and earn N..1 rank points (winner of an N-policy episode gets N, last gets 1, ties share the better place), and a policy's round score is the mean of those rank points across the episodes it played. Margins of victory are discarded — only placement matters. complete_round now delegates per-policy round scoring to an overridable _round_scores_by_policy; the base keeps mean scoring, RulesetStrategyCommissioner switches to rank points when configured. Rank rounds are tagged with a distinct score_kind so switching a league from mean to rank filters the now-incomparable prior-regime results off the leaderboard instead of blending score scales. Default stays "mean", so existing configs are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| "Rounds rank policies by placement within each episode rather than by raw score: in an episode with N " | ||
| "policies the highest-scoring policy earns N points and the lowest earns 1 (ties share the better place), and " | ||
| "a policy's round score is the average of those rank points across the episodes it played. Margins of victory " | ||
| "are discarded — only who beat whom each game matters. The division leaderboard combines completed rounds with " |
Contributor
There was a problem hiding this comment.
@KyleHerndon note that right now, with division leaderboard computation and commissioner leaderboard computation split the way we do, and with our UI only reflecting commissioner-reported description, we force the commissioner to abstraction-leak by describing how its roundresults get managed by app-backend
Contributor
There was a problem hiding this comment.
sorry more plainly: ideally we wouldnt need commissioners to say that their roundresults get 2h-ewma'd; they shouldn't need to know about or speak about it, and can't enforce it
nishu-builder
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an opt-in rank-by-episode round scoring mode to the ruleset_strategy commissioner. Requested for the agricogla league (the metta-side config + image bump is a follow-up).
scoring.round_score: "rank"ScoringConfig.round_scorepreviously only accepted"mean"(round score = mean of a policy's per-episode scores). Adds"rank":N - (#strictly-higher)).How
complete_roundnow delegates per-policy round scoring to an overridable_round_scores_by_policy(entries, episode_results) -> (scores, ranked_counts). The baseBaselineCommissionerkeeps mean scoring;RulesetStrategyCommissionerswitches to rank points whenscoring.round_score == "rank". The ranking/metadata assembly is unchanged.score_kind(rank_episode_round_score) viaRankingConfig.result_metadata/filter_metadata, so switching a league frommean→rankfilters the now-incomparable prior-regime round results off the commissioner leaderboard instead of blending two score scales.scoring_mechanicsdescribes the rank scheme for the division description.Default stays
"mean", so every existing config behaves exactly as before.Tests
test_ruleset_strategy_rank_round_score_uses_per_episode_placement: same episode inputs as the existing mean test, asserts round scores become the mean per-episode rank points and the rankscore_kindtag is applied.utils.pyare untouched).Follow-up (not in this PR)
agricogla-commissioner.yamltoscoring: {round_score: rank}and bump thecommissioners-defaultimage digest once this is merged + the image is rebuilt/published.score, which becomes mean per-episode rank points.🤖 Generated with Claude Code