Skip to content

fix: propagate model/variant/thinking to sweep aggregate summary#59

Merged
pruiz merged 2 commits into
masterfrom
fix/sweep-summary-model-propagation
Jun 15, 2026
Merged

fix: propagate model/variant/thinking to sweep aggregate summary#59
pruiz merged 2 commits into
masterfrom
fix/sweep-summary-model-propagation

Conversation

@pruiz

@pruiz pruiz commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Problem

The aggregate sweep summary step (run_sweep_summary in tools/run-sweep.py) was using raw opencode run without resolving CodeCome's model configuration. This meant:

  • CODECOME_MODEL / CODECOME_MODEL_VARIANT env vars
  • OPENCODE_ARGS='--model ... --variant ...'
  • codecome.yml agent pinning for auditor

...were all ignored for the aggregate step, while per-file sweep runs (which go through tools/run-agent.py) honoured them correctly.

When the pinned model is unsupported by the user's account (e.g. gpt-5.5-cyber-preview with a ChatGPT account), per-file sweeps work fine but the aggregate step fails with a bad request error because raw opencode run falls back to the wrong model.

Fix

run_sweep_summary() now calls resolve_runtime_config("auditor") (same resolution stack as per-file runs) and passes explicit --model, --variant, and --thinking flags to the raw opencode run command. It also prints the resolution source in the banner for visibility.

Tests

Three new tests in TestRunSweepSummaryModelPropagation:

  • test_passes_model_and_variant_flags — verifies --model and --variant are passed when resolved
  • test_passes_thinking_flag_when_on — verifies --thinking is passed when thinking is on
  • test_no_flags_when_nothing_resolved — verifies no extra flags when nothing is resolved

Verification

  • All 19 sweep tests pass
  • 841/848 total tests pass (7 pre-existing flaky mock_llm_parity failures from opencode serve 1.17.7 timing bug)
  • Frontmatter checks pass for all 29 findings

Summary by CodeRabbit

  • Bug Fixes
    • Sweep summary generation now properly applies configured model, variant, and thinking flag settings when executing analysis commands.

The aggregate sweep rollup step (run_sweep_summary) was using raw
'opencode run' without resolving CodeCome's model configuration,
which meant CODECOME_MODEL, CODECOME_MODEL_VARIANT, OPENCODE_ARGS,
and codecome.yml agent pinning were all ignored for the aggregate
step while the per-file sweeps honoured them correctly.

Now resolve_runtime_config('auditor') is called and the resolved
model, variant, and thinking flags are passed explicitly to the
raw opencode run command, matching the per-file sweep behaviour.

Also prints the resolution source in the banner for visibility.

Adds three regression tests covering model+variant, thinking-only,
and nothing-resolved scenarios.
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@pruiz, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 36 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5c1888b2-68ac-4ec1-9a36-c3c6e4126cb6

📥 Commits

Reviewing files that changed from the base of the PR and between 00bfcbd and bc4774a.

📒 Files selected for processing (1)
  • tests/test_run_sweep.py
📝 Walkthrough

Walkthrough

run_sweep_summary in tools/run-sweep.py now imports and calls resolve_runtime_config("auditor"), then conditionally appends --model, --variant, and --thinking to the opencode run command. A new test class TestRunSweepSummaryModelPropagation with env helpers and three tests covers all three flag-combination scenarios.

Changes

Dynamic model flag propagation in sweep summary

Layer / File(s) Summary
resolve_runtime_config import and conditional flag construction
tools/run-sweep.py
Adds import for resolve_runtime_config and rewrites the opencode run command in run_sweep_summary to conditionally append --model, --variant, and --thinking based on auditor runtime config values.
TestRunSweepSummaryModelPropagation test suite
tests/test_run_sweep.py
Adds _setup_summary_env/_teardown_summary_env helpers and three tests that monkeypatch resolve_runtime_config and subprocess.run to assert correct flag inclusion/omission for model+variant, thinking-only, and no-flags scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 Hippity hop, the flags now flow,
--model and --variant join the show.
When thinking's on, that flag appears,
When nothing resolves, the command clears.
Three tests now guard what the rabbit built,
No broken sweep shall cause me guilt! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: propagate model/variant/thinking to sweep aggregate summary' clearly and concisely describes the main change—resolving and passing model configuration to the sweep aggregate summary step. It is specific, directly related to the changeset, and accurately reflects the primary fix.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sweep-summary-model-propagation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

Coverage Report

Metric Value
Line Coverage 76.1%
Lines Covered 0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-15T11:45:07.450Z

@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes run_sweep_summary in tools/run-sweep.py so that the aggregate sweep step honours the same model/variant/thinking configuration stack (env vars, codecome.yml agent pinning) already used by per-file sweep runs via run-agent.py.

  • run_sweep_summary now calls resolve_runtime_config("auditor") and conditionally appends --model, --variant, and --thinking to the opencode run command, matching the behaviour of the per-file path.
  • Three new focused tests stub both the resolver and subprocess.run to verify each flag combination independently.

Confidence Score: 5/5

Safe to merge — the change is a focused, self-contained bug fix with no side effects on the per-file sweep path.

The fix is minimal: one new import, one resolver call, and three conditional flag appends, all guarded by truthiness checks. The three new tests stub both the resolver and the subprocess call, covering every branch (model+variant, thinking-only, nothing resolved). No pre-existing behaviour is altered — the prompt text is still correctly appended last, and dry_run gating at the call site in main() remains unchanged.

No files require special attention.

Important Files Changed

Filename Overview
tools/run-sweep.py Adds resolve_runtime_config("auditor") call in run_sweep_summary and conditionally appends --model, --variant, --thinking flags to the opencode run command; also prints resolution details in the banner.
tests/test_run_sweep.py Adds TestRunSweepSummaryModelPropagation with three tests covering model+variant flags, thinking flag, and the no-flags baseline; uses monkeypatch to stub resolve_runtime_config and subprocess.run.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["run_sweep_summary(files, summaries)"] --> B["resolve_runtime_config('auditor')"]
    B --> C{rc.model?}
    C -- yes --> D["extend(['--model', rc.model])"]
    C -- no --> E{rc.variant?}
    D --> E
    E -- yes --> F["extend(['--variant', rc.variant])"]
    E -- no --> G{rc.thinking_on?}
    F --> G
    G -- yes --> H["append('--thinking')"]
    G -- no --> I["append(prompt_text)"]
    H --> I
    I --> J["subprocess.run(command, cwd=ROOT)"]
Loading

Reviews (2): Last reviewed commit: "chore: address review feedback - drop de..." | Re-trigger Greptile

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tools/run-sweep.py (1)

194-225: 🏗️ Heavy lift

Move aggregate summary runtime-resolution logic out of the root script and delegate to a package module.

This change adds more implementation behavior directly in tools/run-sweep.py; per repo policy, root-level tools should stay delegation-only wrappers. Please move run_sweep_summary’s config resolution + command assembly into a package module and have this script call it.

As per coding guidelines, “Standalone scripts at the tools/ root … should be thin wrappers that delegate to their respective packages. Implementation must live in the package, not the script.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/run-sweep.py` around lines 194 - 225, Move the implementation logic
from the run_sweep_summary function in tools/run-sweep.py to a package module.
Specifically, extract the config resolution with resolve_runtime_config, the
command assembly logic that builds the list with opencode run and the optional
model/variant/thinking arguments, and the subprocess.run execution into a new
function within the package. Have run_sweep_summary in the root script become a
thin wrapper that calls this new package function, keeping only the print
statements and file path handling in the script itself.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_run_sweep.py`:
- Line 267: The lambda stubs for resolve_runtime_config at lines 267, 309, and
344 ignore the agent input parameter, which means they cannot detect if the
resolver is being called with an incorrect role. Replace each of these three
stubs (in tests/test_run_sweep.py at lines 267, 309, and 344) with
implementations that assert the agent parameter contains the auditor role before
returning fake_rc. This will ensure that role-routing regressions are caught
during testing instead of going undetected.

---

Nitpick comments:
In `@tools/run-sweep.py`:
- Around line 194-225: Move the implementation logic from the run_sweep_summary
function in tools/run-sweep.py to a package module. Specifically, extract the
config resolution with resolve_runtime_config, the command assembly logic that
builds the list with opencode run and the optional model/variant/thinking
arguments, and the subprocess.run execution into a new function within the
package. Have run_sweep_summary in the root script become a thin wrapper that
calls this new package function, keeping only the print statements and file path
handling in the script itself.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: abbe8220-71c1-42b9-8866-d5f5dfbfe1e8

📥 Commits

Reviewing files that changed from the base of the PR and between f0c0431 and 00bfcbd.

📒 Files selected for processing (2)
  • tests/test_run_sweep.py
  • tools/run-sweep.py

Comment thread tests/test_run_sweep.py Outdated
Comment thread tests/test_run_sweep.py Outdated

@pruiz pruiz left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing remaining review items:

@coderabbitai nitpick (move to package module): Deferred.

The suggestion to extract run_sweep_summary into a package module is architecturally valid and consistent with tools/AGENTS.md. However, tools/run-sweep.py is already a 280-line implementation-heavy script (argparse, file resolution, prompt building, subprocess orchestration) — it was never a thin wrapper. The 10 lines added here are fully consistent with the existing pattern. A proper package extraction of the sweep runner belongs in a separate architectural refactor PR, not as a blocking concern for this focused bug fix.

@coderabbitai inline (assert agent role): Fixed in bc4774a. All three lambda stubs now assert agent == "auditor".

@greptile-apps inline (dead monkeypatch param): Fixed in bc4774a. Removed from signature and call sites.

@pruiz pruiz merged commit 406089f into master Jun 15, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant